9

I want to hash a simple array of strings The documentation says you can't simple feed a string into hashlib's update() function, so I tried a regular variable, but then I got the TypeError: object supporting the buffer API required error.

Here's what I had so far

def generateHash(data):
    # Prepare the project id hash
    hashId = hashlib.md5()

    hashId.update(data)

    return hashId.hexdigest()
1
  • as far as I know, you should be able to feed a string into hashlib's update function, could you provide more info? Commented Jul 1, 2013 at 19:45

3 Answers 3

13

You can use the repr() function to get the (Unicode) string representation of the array (or of whatever object that implements conversion to a representation). Then you encode the string to UTF-8 (the order of bytes is the same everywhere when using UTF-8). The resulting bytes can be hashed as you tried above:

#!python3
import hashlib

def hashFor(data):
    # Prepare the project id hash
    hashId = hashlib.md5()

    hashId.update(repr(data).encode('utf-8'))

    return hashId.hexdigest()


if __name__ == '__main__':
    data1 = ['abc', 'de']
    data2 = ['a', 'bcde']
    print(hashFor(data1) + ':', data1)
    print(hashFor(data2) + ':', data2)

It prints on my console:

c:\tmp\___python\skerit\so17412304>py a.py
d26d27d8cbb7c6fe50637155c21d5af6: ['abc', 'de']
dbd5ab5df464b8bcee61fe8357f07b6e: ['a', 'bcde']
Sign up to request clarification or add additional context in comments.

5 Comments

There's no guarantee that an arbitrary object's __repr__ returns something that's a useful input for the hash function. hashlib objects themselves, for example, repr() to '<md5 HASH object @ 0x7fb503555a80>'. Even ignoring the nasty implications if this were used for some cryptographic operation, this isn't even deterministic! The same program run at different times won't return the same hash value.
@EricSeppanen: The answer is related to array of strings. You are right. One should not use hammer for every work.
You should still try to use something like ",".join(data) as __repr__ isn't guaranteed to be consistent with future versions. Maybe Python 4 will return a slightly different string (e.g. s'abc instead of 'abc').
@JohannBauer: It is unlikely. Anyway, the question is rather old, and the situation may have changed. The ','.join(data) is buggy as ['a,', 'bb'] would produce the same result as ['a', ',bb']. But you are right. Any suitable function that captures the representation of the array of strings and returns it as bytes can be used for calculating the hash value.
@EricJin: Yes, but the goal was to have the string representation of a list of strings.
2

Depending on what you want to do, getting the hash of all strings concatenated or hash of each string separately. you can get the fist following Thomas solution as m.update(a); m.update(b) is equivalent to m.update(a+b). Or the later following below solution

def generateHash(data):
    # Prepare the project id hash

    return [hashlib.md5(i.encode('utf-8')).hexdigest() for i in data]

Note that it returns a list. Each element is hash of a corresponding element in the given string list

Comments

1

If you'd like to hash a list of strings, a naive solution could be:

def hash_string_list(string_list):
    h = hashlib.md5()
    for s in string_list: # Note that you could use ''.join(string_list) instead
        h.update(s)       # s.encode('utf-8') if you're using Python 3
    return h.hexdigest()

However, be wary that ['abc', 'efg'] and ['a', 'bcefg'] would hash to the same value.

If you provide more context regarding your objective, other solutions might be more appropriate.

2 Comments

...except that update takes bytes, so if you have strings you need to encode them first.
@mata Oh, didn't realize that was a python3 question - sorry.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.