6

In Python 2, one could hash a string by just running:

someText = "a"
hashlib.sha256(someText).hexdigest()

But in Python 3, it needs to be encoded:

someText = "a".encode("ascii")
hashlib.sha256(someText).hexdigest()

But when I try this with a file:

f = open(fin, "r")
sha = hashlib.sha256()
while True:
    data = f.read(2 ** 20).encode("ascii")
    if not data:
        break
    sha.update(data)
f.close()

I get this on many files:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 8: invalid continuation byte

I assume this is because it's a binary file, which likely can't be converted to ASCII.

How can I encode the file without this problem?

3
  • 4
    Try opening the file in binary mode with open(fin, "rb"). Commented Oct 13, 2013 at 5:09
  • @BrenBarn worked perfectly, you should answer with that. Commented Oct 13, 2013 at 5:12
  • can you not do: sha.update(open(filename, "rb").read) ? and why do you do '2 ** 20'? Commented Jun 23, 2023 at 23:20

3 Answers 3

6

On Unix systems, in Python 2 there was no distinction between binary- and text-mode files, so it didn't matter how you opened them.

But in Python 3 it matters on every platform. sha256() requires binary input, but you opened the file in text mode. That's why @BrenBam suggested you open the file in binary mode.

Since you opened the file in text mode, Python 3 believes it needs to decode the bits in the file to turn the bytes into Unicode strings. But you don't want decoding at all, right?

Then open the file in binary mode, and you'll read byte strings instead, which is what sha256() wants.

By the way, your:

someText = "a".encode("ascii")
hashlib.sha256(someText).hexdigest()

can be done more easily in a related way:

hashlib.sha256(b"a").hexdigest()

That is, pass it the binary data directly, instead of bothering with encoding a Unicode string (which the literal "a" is).

Sign up to request clarification or add additional context in comments.

Comments

6

Try opening the file in binary mode with open(fin, "rb").

Comments

0

I have programmed a module which is able to hash big files with different algorithms.

pip3 install py_essentials

Use the module like this:

from py_essentials import hashing as hs
hash = hs.fileChecksum("path/to/the/file.txt", "sha256")

Take a look at the documentation.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.