3

I have a program that handles byte arrays in Java, and now I would like to write this into a XML file. However, I am unsure as to how I can convert the following byte array into a sensible String to write to a file. Assuming that it was Unicode characters I attempted the following code:

String temp = new String(encodedBytes, "UTF-8");

Only to have the debugger show that the encodedBytes contain "\ufffd\ufffd ^\ufffd\ufffd-m\ufffd\ufffd\/ufffd \ufffd\ufffdIA\ufffd\ufffd". The String should contain a hash in alphanumerical format.

How would I turn the above String into a sensible String for output?

3
  • "Understandable" or "sensible" to whom? Is the goal to output so that a human can understand the values or is the goal to output it to a format that can be read back in and transformed back into a byte array? Commented Apr 16, 2010 at 15:40
  • Would serialization (Arrays in Java are serializable) work? See e.g. rgagnon.com/javadetails/java-0470.html Commented Apr 16, 2010 at 16:25
  • @Bert F: For arguments sake, let's say that the human-readable output is "2jvjsgjlgj39hg9". I would like to convert the string in the question to that string so that it can be both read by humans and stored. Commented Apr 16, 2010 at 16:42

2 Answers 2

10

The byte array doesn't look like UTF-8. Note that \ufffd (named REPLACEMENT CHARACTER) is "used to replace an incoming character whose value is unknown or unrepresentable in Unicode."

Addendum: Here's a simple example of how this can happen. When cast to a byte, the code point for ñ is neither UTF-8 nor US-ASCII; but it is valid ISO-8859-1. In effect, you have to know what the bytes represent before you can encode them into a String.

public class Hello {

    public static void main(String[] args)
            throws java.io.UnsupportedEncodingException {
        String s = "Hola, señor!";
        System.out.println(s);
        byte[] b = new byte[s.length()];
        for (int i = 0; i < b.length; i++) {
            int cp = s.codePointAt(i);
            b[i] = (byte) cp;
            System.out.print((byte) cp + " ");
        }
        System.out.println();
        System.out.println(new String(b, "UTF-8"));
        System.out.println(new String(b, "US-ASCII"));
        System.out.println(new String(b, "ISO-8859-1"));
    }
}

Output:

Hola, señor!
72 111 108 97 44 32 115 101 -15 111 114 33 
Hola, se�or!
Hola, se�or!
Hola, señor!
Sign up to request clarification or add additional context in comments.

Comments

7

If your string is the output of a password hashing scheme (which it looks like it might be) then I think you will need to Base64 encode in order to put it into plain text.

Standard procedure, if you have raw bytes you want to output to a text file, is to use Base 64 encoding. The Commons Codec library provides a Base64 encoder / decoder for you to use.

Hope this helps.

3 Comments

Recommend the asker create an attribute for that element to indicate the encoding (with a default value in the DTD or schema so you don't necessarily have to specify it in the doc).
Yep, it's a hash. I'll have a look at the Commons Codec stuff soon. I assume you just download the jars and implement them into your project?
@Ender - yes that's right. There should be a user guide on the site to get you started.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.