2

I'm trying to extract some string from a file using python re, then MD5ing this string using something like:

    #MD5er.py
    salt = extract_salt(file_foo)
    print 'salt: %s' % salt
    from md5 import md5
    print 'hash: %s' % md5(salt).hexdigest()

$python MD5er

    salt: \0001\072\206\277\354\107\134\061\361\076\150\047\010\124\200\315\100
    hash: ce24166858853dfb12a86d7d602b0638

BUT, using iPython like that:

    In [40]: salt = '\0001\072\206\277\354\107\134\061\361\076\150\047\010\124\200\315\100'

    In [41]: salt
    Out[41]: "\x001:\x86\xbf\xecG\\1\xf1>h'\x08T\x80\xcd@"

    In [42]: print salt
    1:���G\1�>hT��@

    In [43]: from md5 import md5

    In [44]: md5(salt).hexdigest()
    Out[44]: 'ebae47a953591f7448ff7079837fb534'

Any clues why the MD5 is different in the 2 scenarios? and why in ipython when I typed the variable name it appeared in a different format from the original string, and print() output was a third format!?

Hint:

    In [53]: import sys
    In [54]: sys.getdefaultencoding()
    Out[54]: 'ascii' 
1
  • Are the backslashes actually in the file? Commented Oct 22, 2011 at 5:21

1 Answer 1

4

The string in the first case is exactly what you saw printed:

>>> salt = '\\0001\\072\\206\\277\\354\\107\\134\\061\\361\\076\\150\\047\\010\\
124\\200\\315\\100'
>>> md5(salt).hexdigest()
'ce24166858853dfb12a86d7d602b0638'

Notice how I've escaped the backslashes to keep the digits from being interpreted as octal byte values.

The string in the first case is exactly what you saw printed:

>>> salt = '\\0001\\072\\206\\277\\354\\107\\134\\061\\361\\076\\150\\047\\010\\
124\\200\\315\\100'
>>> md5(salt).hexdigest()
'ce24166858853dfb12a86d7d602b0638'

Notice how I've escaped the backslashes to keep the digits from being interpreted as octal byte values.

Edit:

Assuming you want to create a byte string from the octal values in this list:

data = ['\\0001', '\\072', '\\206', '\\277', '\\354', '\\107', '\\134', 
        '\\061', '\\361', '\\076', '\\150', '\\047', '\\010', '\\124', 
        '\\200', '\\315', '\\100']

You can convert to an integer and then join the characters, but it's different from what you got in IPython. The first value is 4 digits instead of 3. Should it be treated as '\0' followed by an ASCII '1', or should it be treated as '\1'? The following does the latter:

salt = ''.join(chr(int(d[1:], 8)) for d in data)
print repr(salt)
print md5(salt).hexdigest()

Output:

"\x01:\x86\xbf\xecG\\1\xf1>h'\x08T\x80\xcd@"
d2092426d1bd5bec1579c8b7ed9c73c2
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks eryksu, The list which I construct the string from is like: l = ['\\0001', '\\072', '\\206', '\\277', '\\354', '\\107', '\\134', '\\061', '\\361', '\\076', '\\150', '\\047', '\\010', '\\124', '\\200', '\\315', '\\100'] I need to remove the escape character '\', and concatenate all elements to one string to be like the original one I pasted. So I tried: >>>l2 = [element[1:] for element in l] >>>n = '' >>>for el in l2: n += el But I got: In [117]: n Out[117]: '0001072206277354107134061361076150047010124200315100' wz different MD5
For '\\0001', yes I needed '\\000' then injected the '1' after conversion. Thanks eryksun, It worked :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.