Encoded string compression algorithm in C/C++

Question

I am searching for an algorithm to compare two strings efficiently without occupying much memory and in less time. So what I am currently doing is, compressing character string first and then comparing both compressed string(to avoid memory errors. as string could be very long here)

String contains characters from set [0-9],x,o,X.

Now compression rule is like only certain repetitive tokens need to be compressed. For example: 'o' is end of token and it comes always at end of sequence of one or more digits(0-9},'x' is to show multiplication etc.

Examples: 1. 8o8o80 should be compressed as 3x80 2. 8oXXXX should be compressed as 804xX 3. 64o8o8o16o16o should be 64o2x8o2x16o etc..

I wonder if is there any existing algorithms for such compression strings?

Will appreciate any kind of help to sort this out. Thanks!!

How can compressing and then comparing be faster than just comparing? — Jon
– Jon, Commented Mar 1, 2012 at 11:45
what is wrong with a naive compare? It doesn't use any more memory than that is already used by the two strings, and could be faster (potentially more accurate)... — Nim
– Nim, Commented Mar 1, 2012 at 11:46
@Jon: not faster but memory efficient as I was getting out of memory error for uncompressed string. — Raj
– Raj, Commented Mar 1, 2012 at 11:52
what if you just compare the two hashes of the strings? not faster than simply comparing the strings, though. — vulkanino
– vulkanino, Commented Mar 1, 2012 at 11:53
@Raj: The problem is that you are storing many strings in memory and then comparing. It sounds like you should not pre-process all of them and/or not buffer them in memory. If your code should do this but does not, compression will not fix your problem but merely postpone the day of reckoning. — Jon
– Jon, Commented Mar 1, 2012 at 12:04

stacker · Accepted Answer · 2012-03-01 11:56:43Z

1

You're looking for the run length encoding algorithm. You find some implementations here

answered Mar 1, 2012 at 11:56

stacker

69.1k29 gold badges143 silver badges214 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

vulkanino Over a year ago

yes, but he has a strange "terminator" character, so he might need to modify the algorithm.

Raj Over a year ago

Thanks stacker. But in default implementation of RLE, token is a char but for me its set of char. And also there is no notion of terminating char in that algo.

Raj Over a year ago

I am not able to find any suitable compression algorithm meeting my expectation. So I am going to implement one which works for me.

Collectives™ on Stack Overflow

Encoded string compression algorithm in C/C++

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related