0

I need to parse String content as binary sequence and convert them to its UTF-8 equivalent String.

For example, UTF-8 binary equivalents of B, A and R are as follows:
B = 01000010
A = 01000001
R = 01010010


Now, I need to convert a string "010000100100000101010010" to string "BAR"
i.e. For above case input string with 24 characters are divided into three equal parts(8 character in each part) and translated to its UTF-8 equivalent as a String value.

Sample Code:

public static void main(String args[]) {
    String B = "01000010";
    String A = "01000001";
    String R = "01010010";
    String BAR = "010000100100000101010010";

    String utfEquiv = toUTF8(BAR);//expecting to get "BAR"
    System.out.println(utfEquiv);
}

private static String toUTF8(String str) {
    // TODO 
    return "";
}

What should be the implementation of method toUTF8(String str){}

2
  • This might help joelonsoftware.com/articles/Unicode.html Commented Feb 8, 2016 at 8:29
  • @Yesyoor , Updated but you can propose or EDIT. Thanks in advanced Commented Feb 25, 2022 at 4:15

1 Answer 1

4

You should separate this into two problems:

  • Converting the string into a byte array by parsing the binary values
  • Converting the byte array back into a string using UTF-8

The latter is very straightforward, using new String(bytes, StandardCharsets.UTF_8).

For the first part, the tricky part is that Byte.parseByte won't automatically handle a leading 1... so I'd probably parse each 8-bit string into a short and then cast to byte:

public static byte[] binaryToBytes(String input) {
    // TODO: Argument validation (nullity, length)
    byte[] ret = new byte[input.length() / 8];
    for (int i = 0; i < ret.length; i++) {
        String chunk = input.substring(i * 8, i * 8 + 8);
        ret[i] = (byte) Short.parseShort(chunk, 2);
    }
    return ret;
}
Sign up to request clarification or add additional context in comments.

5 Comments

is it input.length or input.length() and Short.parse(chunk, 2) or Short.parseShort(chunk, 2)? because your code showing compilation error.
@mmuzahid: Thanks for the substring fix, too. I always forget which way round Java is - .NET takes the length instead of the end index...
could you tell me the purpose of radix? I never use this.
@mmuzahid: It's to say that it's binary ('1' and '0' characters). If your string contained hex instead, it would be 16 (and you'd use two characters per byte, of course).
Striked again by Jon Skeet!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.