1

SOLUTION see EDIT at bottom of this comment.

PROBLEM: I have a directory with a heap of images, named something like below:

  • image001.nef
  • image002.nef
  • image003.nef
  • image003 - 20170609.jpg
  • image004.nef
  • image005.nef
  • image006 - 20170609.nef
  • image007.nef
  • image007 - 20170609.jpg
  • image008.jpg
  • image008 - 20170609.nef

I want to find all images that are a duplicate base name (like imageXXX) AND the extension is JPG

So from my above list, there are only three items that match the criteria to delete (i have bold those items).

I have 2,500 images so a pythonic way is desirable to me manually going through.

I am having a hard time finding an example script to use, all the ones I have found are checking the HASH or something, which I don't believe is useful as the images are indeed similar, but not identical.

Cheers

EDIT: thanks to dawg I was able to get the output I desire... here is the final code that worked for me:

import os

directory = r'C:\temp'
out_directory = r'C:\temp\temp_usa_photos'
fns = os.listdir(directory)


ref_nef = {fn[0:15] for fn in fns if fn.upper().endswith('.NEF')}

print ref_nef

out_list = filter(lambda e: e[0:15] in ref_nef, [fn for fn in fns if fn.upper().endswith('.JPG')])

print out_list

for f in out_list:
    input_file = os.path.join(directory, f)
    output_file = os.path.join(out_directory, f)
    os.rename(input_file, output_file)
3
  • What have you done so far? Commented Jun 9, 2017 at 6:13
  • You have to delete them only based on the filename? I don't exactly understand what prevents you from looping over all images, extracting base names, writing them to a dict/list and then removing all further duplicates encountered. Commented Jun 9, 2017 at 6:21
  • @moritzg i have just added code to original comment Commented Jun 11, 2017 at 4:56

1 Answer 1

2

Given:

>>> fns
['image001.nef', 'image002.nef', 'image003.nef', 'image003 - 20170609.jpg', 'image004.nef', 'image005.nef', 'image006 - 20170609.nef', 'image007.nef', 'image007 - 20170609.jpg', 'image008.jpg', 'image008 - 20170609.nef']

(I can use that list as a proxy for a listing of file names. Just use a glob or listdir for files...)

If your file names are all of the form imageXXX you can first use that to create a set of the file names first 8 letters of the .nef files:

>>> ref_nef={fn[0:8] for fn in fns if fn.upper().endswith('.NEF')}
>>> ref_nef
set(['image008', 'image005', 'image004', 'image007', 'image006', 'image001', 'image003', 'image002'])

Then use that to filter the .jpg files to delete:

>>> filter(lambda e: e[0:8] in ref_nef, [fn for fn in fns if fn.upper().endswith('.JPG')])
['image003 - 20170609.jpg', 'image007 - 20170609.jpg', 'image008.jpg']
Sign up to request clarification or add additional context in comments.

2 Comments

I am wondering if there is a very simple solution to my new issue. Your solution fixed 99% of my problem, but I just found out there are some rogue NEF files. If you see this screenshot you can see some duplicate NEF files are present, I am wondering if there is a way to rid my folder of all the NEWER nef files. In this case the top one needs to go, it will have a longer name AND be newer. Can you help with this one? Thanks heaps for your assistance!
If this does 99%, then use this. After, you can use a dup finding approach where you actually read the file and compare. A md5 hash is useful for this. Good luck. Ask a new question if you get stuck

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.