0

I have a large file consisting of bytes which are coded in code page 852. I need to read the bytes and export them to strings to put in Objects, and then serialize those objects to XML.

The mapping function for reading the bytes is:

private string Mapper(int start, int length)
{
   byte[] result = new byte[length];
   Array.Copy(baseFile, localOffset + start, result, 0, length);

   return Encoding.ASCII.GetString(result, 0, length); 
}

Where the local offset is just the position in the database. After that I use the Mapper function to fill in the string fields of my object instance, and then I just serialize it. Here's the method for that:

 private string XMLify(Object node)
    {

        XmlSerializerNamespaces ns = new XmlSerializerNamespaces();
        ns.Add("", "");

        var stringWriter = new StringWriter();
        var serializer = new XmlSerializer(node.GetType());

        serializer.Serialize(stringWriter, node, ns);
        String s = stringWriter.ToString();
        return s.Substring(s.IndexOf(Environment.NewLine) + 1);

    }

However, when I serialize the object instance. the XML contains strings such as "& # x 0 ;" (spaces added only to display it properly) among others. That specific one was blank when viewing the database in a hex editor, and was mapped as many times as there are blank spaces.
I know the source file is in code page 852, how do I convert it to 1250 to export as XML?

2 Answers 2

1

Don't use ASCII encoding to parse the data, use the encoding of the actual code page:

Encoding encoding = Encoding.GetEncoding(852);
return encoding.GetString(result, 0, length); 

UPDATE:

For this issue it is important to understand what an encoding actually does: With the encoding you control the transition between string and binary data. Byte arrays and strings are on either end of the transition. So, once you actually have a byte array or a string, an encoding is not necessary. Binary data is already encoded and the string is always Unicode.

Your XMLify method returns a string (using a string writer), so Encoding is not an issue here. Unless somewhere downstream you have another conversion to a byte array, your problem is not Encoding.

Have you actually confirmed that the XML is incorrect? While XML required much less escaping than e.g. HTML, some characters will need to be escaped. So your � could be a valid representation of the input data. Unless you actually provide the object (including the data in its fields) that you serialize and the produced XML, it's impossible to tell if there is an error. I assume that you deserialize the XML somewhere else. If that deserialization is correct, you're probably fine.

Sign up to request clarification or add additional context in comments.

3 Comments

I don't get it. You have provided the code that reads the data from the source. Does the string that the method returns have the correct text? That's what this method does. If the output is unchanged, maybe it's worth giving us the method that writes the result.
I've added the XML serialization code to the original post, maybe something's wrong there?
The return of XMLify is still a string. If you have a problem with the encoding, you will not see it there. You can choose to use a StreamWriter instead of a StringWriter. Your output will be into the target stream (MemoryStream if you need a byte array) and you can specify the target encoding. But as Charles Mager has pointed out, your result could contain XML escapes, which can come right from serialization. That would be OK according to the XML standard.
0

&#x0 is the entitising of the null character. This will happen because this is character is present in one of the strings the serialiser is asked to serialise.

See this fiddle for an example.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.