13

This example works fine example:

import hashlib
m = hashlib.md5()
m.update(b"Nobody inspects")
r= m.digest()
print(r)

Now, I want to do the same thing but with a variable: var= "hash me this text, please". How could I do it following the same logic of the example ?

6
  • Have you tried m.update(var)? Commented Jul 23, 2014 at 8:15
  • 3
    @tobias_k: that'll give an error; that's not a bytes value. Commented Jul 23, 2014 at 8:19
  • 2
    For future reference: not everyone knows that hash.update() needs bytes and that therefor your problem was that you got an exception when you tried to use a str value instead. Next time include that exception in your question. Commented Jul 23, 2014 at 8:21
  • 1
    Thought so (sounded much too easy) but when I tried it worked and I got the same hashcode... tested on Python 2.7, though, not on 3 Commented Jul 23, 2014 at 8:21
  • 1
    @tobias_k: But that is a crucial difference; Python 3 is built on a clear distinction between Unicode and bytes from the ground up. Commented Jul 23, 2014 at 8:23

3 Answers 3

14

The hash.update() method requires bytes, always.

Encode unicode text to bytes first; what you encode to is a application decision, but if all you want to do is fingerprint text for then UTF-8 is a great choice:

m.update(var.encode('utf8')) 

The exception you get when you don't is quite clear however:

>>> import hashlib
>>> hashlib.md5().update('foo')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Unicode-objects must be encoded before hashing

If you are getting the hash of a file, open the file in binary mode instead:

from functools import partial

hash = hashlib.md5()
with open(filename, 'rb') as binfile:
    for chunk in iter(binfile, partial(binfile.read, 2048)):
        hash.update(chunk)
print hash.hexdigest()
Sign up to request clarification or add additional context in comments.

3 Comments

I followed the link you gave me and read about digest() method: can it receive a very long phrase ? I mean, if my variable above is the content of a text file (which is the case already), and my text file contains a lot of text, will digest() accept such a big file content ?
@begueradj: yes, it can take anything that fits in Python. If you are reading a text file, you can call .digest() multiple times, each time with a next chunk. Loop over the file to get lines, pass each line to .digest(), and when the file is done get the digest.
@begueradj: or you can open the file in binary mode and you will not have to encode again.
4

Try this. Hope it helps. The variable var has to be utf-8 encoded. If you type in a string i.e. "Donald Duck", the var variable will be b'Donald Duck'. You can then hash the string with hexdigest()

#!/usr/bin/python3
import hashlib
var = input('Input string: ').encode('utf-8')
hashed_var = hashlib.md5(var).hexdigest()
print(hashed_var)

Comments

1

I had the same issue as the OP. I couldn't get either of the previous answers to work for me for some reason, but a combination of both helped come to this solution.

I was originally hashing a string like this;

str = hashlib.sha256(b'hash this text')
text_hashed = str.hexdigest()
print(text_hashed)

Result;d3dba6081b7f171ec5fa4687182b269c0b46e77a78611ad268182d8a8c245b40

My solution to hash a variable;

text = 'hash this text'
str = hashlib.sha256(text.encode('utf-8'))
text_hashed = str.hexdigest()
print(text_hashed)

Result; d3dba6081b7f171ec5fa4687182b269c0b46e77a78611ad268182d8a8c245b40

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.