2

When I run the following program:

public static void main(String args[]) throws Exception
{
    byte str[] = {(byte)0xEC, (byte)0x96, (byte)0xB4};
    String s = new String(str, "UTF-8");
}

on Linux and inspect the value of s in jdb, I correctly get:

 s = "ì–´"

on Windows, I incorrectly get:

s = "?"

My byte sequence is a valid UTF-8 character in Korean, why would it be producing two very different results?

4
  • How do you "get" the values? Do you print them to console? Commented Oct 2, 2012 at 21:18
  • The windows command prompt cannot display UTF8 characters unless you change the codepage using chcp and you need to use a font that can display those characters. Commented Oct 2, 2012 at 21:21
  • Related stackoverflow.com/questions/8616915/… Commented Oct 2, 2012 at 21:23
  • See also here: stackoverflow.com/q/388490/330315 Commented Oct 2, 2012 at 21:24

4 Answers 4

3

It correctly prints "" on my computer (Ubuntu Linux), as described in Code Table Korean Hangul. Windows command prompt is known to have issues with encoding, don't bother.

Your code is fine.

Sign up to request clarification or add additional context in comments.

1 Comment

My mistake. The Korean characters were properly displaying in my Emacs text buffer so I naturally assumed that they would display properly in the Emacs shell buffer. Which as folks pointed out, they do not.
1

It gives for me. This means your console is probably not configured to display UTF-8 and it is a printing/display problem, rather than a problem with representation.

Comments

1

You get the correct string, it's Windows console that does not display the string correctly.

Here is a link to an article that discusses a way to make Java console produce correct Unicode output using JNI.

Comments

0

JDB is displaying the data incorrectly. The code works the same on both Windows and Linux. Try running this more definitive test:

public static void main(String[] args) throws Exception {
    byte str[] = {(byte)0xEC, (byte)0x96, (byte)0xB4};
    String s = new String(str, "UTF-8"); 
    for(int i=0; i<s.length(); i++) {
        System.out.println(BigInteger.valueOf((int)s.charAt(i)).toString(16));
    }
}

This prints out the hex value of every character in the string. This will correctly print out "c5b4" in both Windows and Linux.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.