I have a CSV file, appears to be UTF-16, dumped from SQL Server. This file contains properly encoded accents (spanish) but some of the rows are encoded differently. Like this:
0xd83d0xde1b0xd83d0xde1b0xd83d0xde1b
This seems to be a strange encoding for
\ud83d\ude1b\ud83d\ude1b\ud83d\ude1b
\ud83d\ude1b are surrogate pairs for an emoji
I need to convert everything to a nice, neat UTF-8 file. I tried endless combinations of bytearray(), encode(), decode(), and so on.
How can I convert this file of mixed UTF-16 and escaped UTF-16 into proper Python 3 strings, and finally save them to a new UTF-8 file?