0

I tried to compress a string "XZJ6RTNN4NNNNNNR8YWWX7ZGWO1XXQT6PSRT5281I0WQZM75K2P3SPH81XN4M3L1WF6Q" in c#. I am using the code which is marked as answered in the "https://stackoverflow.com/questions/7343465/compression-decompression-string-with-c-sharp?rq=1" link. But I am getting compressed string greater than the input. The code which is marked as answered is not working. Please let us know how to reduce this string size.

   public static void CopyTo(Stream src, Stream dest) {
    byte[] bytes = new byte[4096];

    int cnt;

    while ((cnt = src.Read(bytes, 0, bytes.Length)) != 0) {
        dest.Write(bytes, 0, cnt);
    }
}

public static byte[] Zip(string str) 
{
    var bytes = Encoding.UTF8.GetBytes(str);

    using (var msi = new MemoryStream(bytes))
    using (var mso = new MemoryStream()) {
        using (var gs = new GZipStream(mso, CompressionMode.Compress)) {
            //msi.CopyTo(gs);
            CopyTo(msi, gs);
        }

        return mso.ToArray();
    }
}

public static string Unzip(byte[] bytes) {
    using (var msi = new MemoryStream(bytes))
    using (var mso = new MemoryStream()) {
        using (var gs = new GZipStream(msi, CompressionMode.Decompress)) {
            //gs.CopyTo(mso);
            CopyTo(gs, mso);
        }

        return Encoding.UTF8.GetString(mso.ToArray());
    }
}

static void Main(string[] args) {
    byte[] r1 = Zip("StringStringStringStringStringStringStringStringStringStringStringStringStringString");
    string r2 = Unzip(r1);
}
2
  • Show your work please.. Your explanation is not clear. Commented Dec 4, 2013 at 12:20
  • 2
    There is no answer marked as helpful in question you provided Commented Dec 4, 2013 at 12:20

3 Answers 3

1

Yes, short values with high entropy commonly get larger, not smaller, when "compressing" them. This is a simple feature of how compression works. Accordingly, many protocols include an "is this compressed" flag to allow short or high-entropy payloads to be sent efficiently - sometimes by an estimator (for example, don't even try if less than 100 bytes), or sometimes by trying the compression, and then sending whichever is smaller.

Sign up to request clarification or add additional context in comments.

2 Comments

Is there anyway to get the string which results smaller than the input which I have given.
Actually What I am trying to do is generation user details from this string. User have to tell this key in offline. If the string is length is larger, then it will take time. so reducing the string length (In md5 hash ,it is one way encryption so I am not using this md5)
0

I'm going to go with one of the comments on that thread:

"There is no reason to do this, and every reason not to do this. You will not save significant space, and you render your database unsearchable. Storage space is the cheapest commodity available to you. The savings for "thousands of strings" of "100 to 200 characters" is going to be insignificant, less than a megabyte. Don't do this, store your strings uncompressed."

Comments

0

It seems that your string may in fact be a base-64 encoded byte array.

If this is the case, then you can "compress" it by converting it back to a byte array:

string original = "XZJ6RTNN4NNNNNNR8YWWX7ZGWO1XXQT6PSRT5281I0WQZM75K2P3SPH81XN4M3L1WF6Q";
Console.WriteLine("Original #characters = " + original.Length + " characters, or byte count = " + 2*original.Length);
byte[] compressed = Convert.FromBase64String(original);
Console.WriteLine("Compressed length = " + compressed.Length);
string decompressed = Convert.ToBase64String(compressed);

if (decompressed == original)
    Console.WriteLine("Decompressed OK");
else
    Console.WriteLine("Failed to decompress!");

The output from this code is:

Original #characters = 68 characters, or byte count = 136
Compressed length = 51
Decompressed OK

So we have gone from 68 characters (or 136 bytes, if the characters are UTF16) down to 51 bytes.

Note that this isn't compressing the data at all. It's merely converting the base-64 ASCII representation back to its original format, ASSUMING that it REALLY is base-64 ASCII.

If it isn't, then clearly you can't convert it back to a byte array.

I posted this just to alert you to the fact that it may be base-64 ASCII encoded data that you are dealing with, and you should check if that is the case.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.