C# Reading bytes to XML with custom code page

Question

I have a large file consisting of bytes which are coded in code page 852. I need to read the bytes and export them to strings to put in Objects, and then serialize those objects to XML.

The mapping function for reading the bytes is:

private string Mapper(int start, int length)
{
   byte[] result = new byte[length];
   Array.Copy(baseFile, localOffset + start, result, 0, length);

   return Encoding.ASCII.GetString(result, 0, length); 
}

Where the local offset is just the position in the database. After that I use the Mapper function to fill in the string fields of my object instance, and then I just serialize it. Here's the method for that:

 private string XMLify(Object node)
    {

        XmlSerializerNamespaces ns = new XmlSerializerNamespaces();
        ns.Add("", "");

        var stringWriter = new StringWriter();
        var serializer = new XmlSerializer(node.GetType());

        serializer.Serialize(stringWriter, node, ns);
        String s = stringWriter.ToString();
        return s.Substring(s.IndexOf(Environment.NewLine) + 1);

    }

However, when I serialize the object instance. the XML contains strings such as "& # x 0 ;" (spaces added only to display it properly) among others. That specific one was blank when viewing the database in a hex editor, and was mapped as many times as there are blank spaces.
I know the source file is in code page 852, how do I convert it to 1250 to export as XML?

Sefe · Accepted Answer · 2016-11-15 19:15:20Z

1

Don't use ASCII encoding to parse the data, use the encoding of the actual code page:

Encoding encoding = Encoding.GetEncoding(852);
return encoding.GetString(result, 0, length);

UPDATE:

For this issue it is important to understand what an encoding actually does: With the encoding you control the transition between string and binary data. Byte arrays and strings are on either end of the transition. So, once you actually have a byte array or a string, an encoding is not necessary. Binary data is already encoded and the string is always Unicode.

Your XMLify method returns a string (using a string writer), so Encoding is not an issue here. Unless somewhere downstream you have another conversion to a byte array, your problem is not Encoding.

Have you actually confirmed that the XML is incorrect? While XML required much less escaping than e.g. HTML, some characters will need to be escaped. So your  could be a valid representation of the input data. Unless you actually provide the object (including the data in its fields) that you serialize and the produced XML, it's impossible to tell if there is an error. I assume that you deserialize the XML somewhere else. If that deserialization is correct, you're probably fine.

edited Nov 15, 2016 at 19:15

answered Nov 15, 2016 at 12:28

Sefe

14.1k6 gold badges45 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Sefe Over a year ago

I don't get it. You have provided the code that reads the data from the source. Does the string that the method returns have the correct text? That's what this method does. If the output is unchanged, maybe it's worth giving us the method that writes the result.

Flopn Over a year ago

I've added the XML serialization code to the original post, maybe something's wrong there?

Sefe Over a year ago

The return of XMLify is still a string. If you have a problem with the encoding, you will not see it there. You can choose to use a StreamWriter instead of a StringWriter. Your output will be into the target stream (MemoryStream if you need a byte array) and you can specify the target encoding. But as Charles Mager has pointed out, your result could contain XML escapes, which can come right from serialization. That would be OK according to the XML standard.

Charles Mager · Accepted Answer · 2016-11-15 13:13:59Z

0

&#x0 is the entitising of the null character. This will happen because this is character is present in one of the strings the serialiser is asked to serialise.

See this fiddle for an example.

answered Nov 15, 2016 at 13:13

Charles Mager

26.3k2 gold badges41 silver badges52 bronze badges

Collectives™ on Stack Overflow

C# Reading bytes to XML with custom code page

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related