0

I am accepting some binary data from a websocket.

I am trying to do json.loads(data) however I get a ValueError thrown

Printing it I get the following result (which is all valid json):

{"session":"SeFKQ0SfYZqhh6FTCcKZGw==","authenticate":1,"id":1791}

but when I inspected the string further, the print was turning this monstrosity into the json above:

'{\x00"\x00s\x00e\x00s\x00s\x00i\x00o\x00n\x00"\x00:\x00"\x00S\x00e
\x00F\x00K\x00Q\x000\x00S\x00f\x00Y\x00Z\x00q\x00h\x00h\x006\x00F
\x00T\x00C\x00c\x00K\x00Z\x00G\x00w\x00=\x00=\x00"\x00,\x00"\x00a
\x00u\x00t\x00h\x00e\x00n\x00t\x00i\x00c\x00a\x00t\x00e\x00"\x00:
\x001\x00,\x00"\x00t\x00h\x00r\x00e\x00a\x00d\x00_\x00i\x00d\x00"
\x00:\x001\x007\x009\x001\x00}\x00'

What is this coming back and how can I do something meaningful (turning it into a native dictionary via json.loads) with it?

1 Answer 1

5

Your data appears to be UTF-16 encoded, little-endian with no BOM (byte-order mark).

I would try first decoding it with the utf16-le decoder:

data = data.decode('utf-16le')

And then load it with json.loads(data).

data = '{\x00"\x00s\x00e\x00s\x00s\x00i\x00o\x00n\x00"\x00:\x00"\x00S\x00e\x00F\x00K\x00Q\x000\x00S\x00f\x00Y\x00Z\x00q\x00h\x00h\x006\x00F\x00T\x00C\x00c\x00K\x00Z\x00G\x00w\x00=\x00=\x00"\x00,\x00"\x00a\x00u\x00t\x00h\x00e\x00n\x00t\x00i\x00c\x00a\x00t\x00e\x00"\x00:\x001\x00,\x00"\x00t\x00h\x00r\x00e\x00a\x00d\x00_\x00i\x00d\x00"\x00:\x001\x007\x009\x001\x00}\x00'
data = data.decode('utf16-le')
print json.loads(data)

Output:

{u'thread_id': 1791, u'session': u'SeFKQ0SfYZqhh6FTCcKZGw==', u'authenticate': 1}
Sign up to request clarification or add additional context in comments.

5 Comments

how did you determine the encoding?
Please consider adding more details for the tipu's question
@tipu Experience, mostly. I noticed that every other byte, starting with the second byte in the stream was 00. That meant every character was encoded as two bytes, in little-endian (least significant byte first) order. I also noticed there was no BOM at the beginning. Then I consulted this answer to remind me which decoder was appropriate.
The better question is, Where did your data come from? Did they have some method of indicating that it would be UTF-16 encoded?
@JonathonReinhart i should have looked at what the javascript web socket is sending when it is specified a binary protocol in it's messaging.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.