1

This may sound foolish, but I'm wondering all the same...

Is it possible to take a string composed of a given character set and compress it by using a bigger character set, or composing it into a number then converting it back at one?

For example, if you had a string that you know what be composed of [a-z][A-Z][0-9]-_+=, could you turn that into a number, the swap it back using more characters in order to compress it?

This is an area I'm not familiar with, I still want to keep it as a string, just a shorter one. (for displaying/echoing/etc, not memory)

4
  • I'm not entirely clear what you mean by "compress" I suppose. Do you want it to take up less physical space in memory, or less visual space in display? If the former, any kind of compression library will work I imagine. If the latter, something like base64 encoding maybe? Commented Apr 22, 2011 at 17:20
  • Any decent compression algorithm does way more than that in a more efficient manner (for instance, huffman coding assigns shorter codes for the more frequent characters). Many of them (again, huffman coding is a good example) are relatively simple. But even those are almost never worth it. How much data are you dealing with? Commented Apr 22, 2011 at 17:22
  • in theory, a string shorter than 100 characters; memory/etc isn't what i'm worried about, but actual character length using the same encoding (utf-8, or whatever the proper terminology is)... This is mostly academic at the moment, but I could see some practical uses if this works. I just don't know much about this subject. Commented Apr 22, 2011 at 17:25
  • 1
    (After a couple of years) Also have a look at JEP 254 at openjdk.java.net/jeps/254 Commented Nov 25, 2015 at 14:12

3 Answers 3

2

I wouldn't bother doing that, unless the string is huge. You can then try to compress it with commons-compress or java.util.zip

Sign up to request clarification or add additional context in comments.

Comments

2

A String internally keeps an array of 16 bit characters, which for western european languages is a waste, you can convert to utf-8 which should give you 50% reduction by doing

 String myString = .....
 ByteArrayOutputStream baos = new ByteArrayOutputStream();
 baos.write(myString.getBytes("UTF-8");
 byte[] data = baos.toByteArray();

and hold onto it as a byte array.

Of course this is rather inconvienent if you actually want to use them as Strings, but if the point is long term storage, without much access, this would save you a bunch.

You would have to do the reverse to recreate a String.

Comments

-1

String is a primitive type, you are unlikely to regain any space by converting unless you use Java's zip library, and even that will not yield the performance benefits you are presumably seeking.

2 Comments

I'm not actually trying to get a performance boost, this came up because I was trying to squeeze ~50 characters into an (shortening a long story) item with a description value in a game so it will fit; the idea was to hide messages in a converted string, rather than creating a hash or something and hiding the real string as data on the item. This way players could take the string and copy it down to share it with others. (outside the game, or through other mediums)
String isn't primitive type in Java.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.