0

I need to compress 20-40 char size of a numeric number to a 6 char size number. So far I have tried Huffman and some Zip algorithms but not getting the desired outcome.

Can some one please advise any other Algorithm/API for this work in Java?

Example:

Input: 98765432101234567890
Desired Output: 123456

Please note: I didn't mean the output has to come as 12345 for the given input. I only mean that if I specify 20 byte number, it should be compressed to a 6 byte number.

Usage: Compressed number will be feeded to a device (which can only take up-to-6 numeric chars). Device will decode the number back to the original number.

Assumption/Limits:

  1. If required both client and device(server) can share some common properties required for encoding/decoding the number.

  2. Only one request can be made to a device i.e. all data should be fed in one request, no chunk of small packets

Thanks.

7
  • 2
    You can do this if you represent the number in its binary form (raw data). But even then, you will need 9 bytes to store. Commented Jun 25, 2012 at 10:35
  • 6
    What you ask for is not possible. Commented Jun 25, 2012 at 10:37
  • 1
    Do you need to actually compress the number? Or is what you need in fact a hash? Commented Jun 25, 2012 at 10:44
  • 1
    Then why dont you just divide the number into 6 byte chunks? Perhaps with some sort of encoding that tells the device how many chunks there are. The only way doing what you want to do is if the numbers in question share some sort of properties, for example if they were all divisible by 100 then you could get away with not storing the last 2 digits. But for generic numbers, no there is no way you can compress an individual # into a smaller # of bytes, at least not unless you have a lot of these numbers you want to store in a single blob Commented Jun 25, 2012 at 10:55
  • 1
    You specify Java, and you say you have 6 chars. Since a Java char is 16 bits, that gives you 96 bits, enough to encode all numbers with not more than 28 decimal digits. If you need to handle larger numbers and don't have severe restrictions on what numbers are legitimate, it's simply not possible. Commented Jun 25, 2012 at 16:26

2 Answers 2

7

This will be the best you can get assuming that any combination of digits is a legal input:

final String s = "98765432101234567890";
for (byte b : new BigInteger('0'+s).toByteArray()) 
  System.out.format("%02x ", b & 0xff);

Prints

05 5a a5 4d 36 e2 0c 6a d2

Storing a number in binary form is theoretically the most efficient way since every combination of bits is a distinct legal value.

You may have other options only if there is more redundancy in your input, that is there are some constraints on the legal digit combinations.

Sign up to request clarification or add additional context in comments.

2 Comments

Marko: Thanks for your time but try printing the output without .length for the same input and you will notice a strange behaviour (the output for the same input is different all the time).
If by "output" you mean the default toString implementation of Java arrays (that returns its hashCode generated from the physical address in memory), then yes, that output is different every time -- and irrelevant to you.
2

The way you specify it, this is not possible. There simply are more 20 digit numbers than there are 6 digit numbers, so if you map 20 digits to only six digits, some 20 digit numbers will have to be mapped to the same six digit number. If you know that not all numbers will be valid or even have the same likelyhood, this can be used for compression, but otherwise this is impossible.

Although a reversible (bijective) mapping from 20 digit numbers to six digit numbers is impossible it is still possible to map long numbers to shorter output. This works by reducing the requirement that the output needs to be a number. The only important consideration is that the output sequence needs to have the same number of possibilities as the input. Here is an example:

  • There are 10^20 possible 20 digit numbers
  • If you use a sequence of full 8-bit ASCII (256 characters) of length x you will have 256^x possible outputs. If you solve this for x, you will notice that 256^9 > 10^20 so 9 ASCII characters are enough to encode 20^10 possible numerical inputs.

Marko's answer to the same question will tell you how to convert a number to it's byte representation which may be used as input. But be aware that this input will not be numerical and may contain many strange symbols.

2 Comments

Notice that OP has tried Huffman and the like. That produces binary output.
@MarkoTopolnik: Yes, agreed. Most likely he wants some binary format, since otherwise this would be impossible. However I also wanted to point out that this is not clear from the description of the requirements. So I did in no way mean to imply that your answer is wrong.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.