15

I am trying to build an md5 cracker for practice. Before I go any further here is my code:

def offline_wordlist_attack(list_path):
      with fileinput.input(files=(list_path)) as wordlist:
          for word in wordlist:
              md5_hash_object = hashlib.md5() # constructing an md5 hash object
              md5_hash_object.update(binascii.a2b_uu(word))
              word_digest = md5_hash_object.digest() # performing the md5 digestion of the word   
              print(word_digest) # Debug

My issue is with md5_hash_object.update(binascii.a2b_uu(word)). The hashlib Python 3 documentation states that the string passed to update() should be in binary representation. The documentation uses m.update(b"Nobody inspects") as an example. In my code, I can not simply attach b in front of the variable word. So I tried to use the binascii library, but that library too, has a note in the documentation stating:

Note

Encoding and decoding functions do not accept Unicode strings. Only bytestring and bytearray objects can be processed.

Could somebody help me out with this? It is getting the better of me.

1
  • note: fileinput.input() might be too slow in your case. You could use md5(word).digest() without explicit update(). Commented Aug 29, 2012 at 10:22

2 Answers 2

14

You need to pass in a bytes object, rather than a str. The typical way to go from str (a unicode string in Python 3) to bytes is to use the .encode() method on the string and specify the encoding you wish to use.

my_bytes = my_string.encode('utf-8')
Sign up to request clarification or add additional context in comments.

Comments

6

Just call fileinput.input(...,mode='rb') to open files in binary mode. Such files produce binary strings instead of Unicode strings as files opened in text mode do.

It allows you to skip an unnecessary (implicit) decoding of bytes read from disk followed by immediate encoding them back to bytes using .encode() before passing them to md5().

1 Comment

This would be another solution. In the more general case, though, it has the downside of not ensuring what encoding you're working with (since it depends on the input file encoding).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.