10

Suppose I have a program A. I run it, and performs some operation starting from a file foo.txt. Now A terminates.

New run of A. It checks if the file foo.txt has changed. If the file has changed, A runs its operation again, otherwise, it quits.

Does a library function/external library for this exists ?

Of course it can be implemented with an md5 + a file/db containing the md5. I want to prevent reinventing the wheel.

4 Answers 4

10

It's unlikely that someone made a library for something so simple. Solution in 13 lines:

import pickle
import hashlib
try:
    l = pickle.load(open("db"))
except IOError:
    l = []
db = dict(l)
path = "/etc/hosts"
checksum = hashlib.md5(open(path).read().encode())
if db.get(path, None) != checksum:
    print("file changed")
    db[path] = checksum
pickle.dump(db.items(), open("db", "w"))
Sign up to request clarification or add additional context in comments.

4 Comments

It would probably be worthwhile first checking st_mtime and st_size: if they've changed, you don't need to checksum, saving time.
A number of things could be done to make this as configurable/one-size-fits-all of a solution as you'd like. My point is simply that it's an easy problem, and it will take longer to look for and configure a general case library than to roll your own.
There are many simple functionalities in the standard library that are solved with a few lines of code, but there they are :) Thanks for the code!
Hi, so I got TypeError: 'builtin_function_or_method' object is not iterable in line db = dict(l). When I printed l, I got <built-in method items of dict object at 0x0000023BE0590240>. Any way to fix this?
7

FYI - for those using this example who got this error: "TypeError: can't pickle HASH objects" Simply modify the following (optionally update md5 to hashlib, md5 is deprecated):

    import pickle
    import hashlib #instead of md5
    try:
        l = pickle.load(open("db"))
    except IOError:
        l = []
    db = dict(l)
    path = "/etc/hosts"
    #this converts the hash to text
    checksum = hashlib.md5(open(path).read()).hexdigest() 
    if db.get(path, None) != checksum:
        print "file changed"
        db[path] = checksum
    pickle.dump(db.items(), open("db", "w"))

so just change:

    checksum = hashlib.md5(open(path).read())

to

    checksum = hashlib.md5(open(path).read()).hexdigest()

Comments

2

This is one of those things that is both so trivial to implement and so app-specific that there really wouldn't be any point in a library, and any library intended for this purpose would grow so unwieldy trying to adapt to the many variations required, learning and using the library would take as much time as implementing it yourself.

Comments

0

Cant we just check the last modified date . i.e after the first operation we store the last modified date in the db , and then before running again we compare the last modified date of the file foo.txt with the value stored in our db .. if they differ ,we perform the operation again ?

5 Comments

That's what make does, and I frankly prefer not to.
What is the problem using modification time?
suppose the file is downloaded every hour from a remote website, or generated from any source that actually recreates the file and it is beyond my control. The modification time will change, but if the actual content is the same, there's no point in re-executing the task.
Of course you can workaround it (for example, write to a temporary file, and then overwrite only if changed, after md5 comparison of the two). I agree there are other solutions.
some types of files can also change contents without the size or last modif date changing... this is the case in particular with TrueCrypt encryption files... SyncBack acknowledges this: you can opt to "check whether contents have modified by a more dependable (but slower) method..."

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.