I have several folders which contain duplicate files that have slightly different names (e.g. file_abc.jpg, file_abc(1).jpg), or a suffix with "(1) on the end. I am trying to develop a relative simple method to search through a folder, identify duplicates, and then delete them. The criteria for a duplicate is "(1)" at the end of file, so long as the original also exists.
I can identify duplicate okay, however I am having trouble creating the text string in the right format to delete them. It needs to be "C:\Data\temp\file_abc(1).jpg", however using the code below I end up with r"C:\Data\temp''file_abc(1).jpg".
I have looked at answers [Finding duplicate files and removing them, however this seems to be far more sophisticated than what I need.
If there are better (+simple) ways to do this then I let me know, however I only have around 10,000 files in total in 50 odd folders, so not a great deal of data to crunch through.
My code so far is:
import os
file_path = r"C:\Data\temp"
file_list = os.listdir(file_path)
print (file_list)
for file in file_list:
if ("(1)" in file):
index_no = file_list.index(file)
print("!! Duplicate file, number in list: "+str(file_list.index(file)))
file_remove = ('r"%s' %file_path+"'\'"+file+'"')
print ("The text string is: " + file_remove)
os.remove(file_remove)
os.path.joinos.remove(os.path.join(file_path,file)if file contans (1) ? why index and all???'r"%s' %file_path+"'\'"+file+'"'embeds the raw prefix in the filepath ...os.walk()to get this), then sort all files by size, then walk linearly through this list, identify the doubles (which are neighbours in this list) andyieldeach such double-group (i. e. a small list of files (typically just two) which are identical). Of course you should check the contents of these few files then to be sure that not just two of them are accidentally the same size without being identical. If you are sure you have a group of identical ones, remove the ones with the least simple names (e. g. with suffixes(1)etc.).