12

I have some strings that are roughly 10K characters each. There is plenty of repetition in them. They are serialized JSON objects. I'd like to easily compress them into a byte array, and uncompress them from a byte array.

How can I most easily do this? I'm looking for methods so I can do the following:

String original = "....long string here with 10K characters...";
byte[] compressed = StringCompressor.compress(original);
String decompressed = StringCompressor.decompress(compressed);
assert(original.equals(decompressed);
4
  • 1
    I would use InflatorInputStream/DeflatorOutputStream with ByteArrayInput/OutputStream. Commented May 13, 2012 at 14:11
  • 2
    There's an easy-to-use 'zip' class out there... edit - it is here docs.oracle.com/javase/6/docs/api/java/util/zip/… and seems to use the classes @peter mentioned. Commented May 13, 2012 at 14:11
  • 2
    How about this? stackoverflow.com/questions/3649485/how-to-compress-a-string Commented May 13, 2012 at 14:13
  • just using String and byte[] this can't be more than a 10-15 line method, assuming the JSON is all ascii. If you have to do something utf-8 ish, add 10 more lines... Commented May 13, 2012 at 14:13

3 Answers 3

30

You can try

enum StringCompressor {
    ;
    public static byte[] compress(String text) {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        try {
            OutputStream out = new DeflaterOutputStream(baos);
            out.write(text.getBytes("UTF-8"));
            out.close();
        } catch (IOException e) {
            throw new AssertionError(e);
        }
        return baos.toByteArray();
    }

    public static String decompress(byte[] bytes) {
        InputStream in = new InflaterInputStream(new ByteArrayInputStream(bytes));
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        try {
            byte[] buffer = new byte[8192];
            int len;
            while((len = in.read(buffer))>0)
                baos.write(buffer, 0, len);
            return new String(baos.toByteArray(), "UTF-8");
        } catch (IOException e) {
            throw new AssertionError(e);
        }
    }
}
Sign up to request clarification or add additional context in comments.

7 Comments

Hello, why do you use enum instead of class here? Is it to prove a point?
Some people like to use enum classes as a way to implement singletons or static-only classes. Recommended by Joshua Bloch, writer of Effective Java.
Its to say there no instances of this class allowed.
Note to myself (and some others who might have wanted to achieve the same): you can't use that method to compress a String into another String, as the charset will mess up the byte content (see stackoverflow.com/questions/2544965/…).
@Matthieu You can use ISO-8859-1 encoding to store bytes in a String without loss.
|
5

Peter Lawrey's answer can be improved a bit using this less complex code for the decompress function

    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    try {
        OutputStream out = new InflaterOutputStream(baos);
        out.write(bytes);
        out.close();
        return new String(baos.toByteArray(), "UTF-8");
    } catch (IOException e) {
        throw new AssertionError(e);
    }

Comments

1

I made a library to solve the problem of compressing generic Strings (expecially short ones). It tries to compress the String using various algorithms (plain utf-8, 5bit encoding for latin letters, huffman encoding, gzip for long Strings) and chooses the one with the shortest result (in the worst case, it will choose the utf-8 encoding, so that you never risk to lose space).

I hope it may be useful, here's the link https://github.com/lithedream/lithestring

EDIT: I realized that your Strings are always "long", my library defaults on gzip for those sizes, I fear I cannot do better for you.

1 Comment

Why a library if the standard API already solves the problem?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.