0

I am searching for an algorithm to compare two strings efficiently without occupying much memory and in less time. So what I am currently doing is, compressing character string first and then comparing both compressed string(to avoid memory errors. as string could be very long here)

String contains characters from set [0-9],x,o,X.

Now compression rule is like only certain repetitive tokens need to be compressed. For example: 'o' is end of token and it comes always at end of sequence of one or more digits(0-9},'x' is to show multiplication etc.

Examples: 1. 8o8o80 should be compressed as 3x80 2. 8oXXXX should be compressed as 804xX 3. 64o8o8o16o16o should be 64o2x8o2x16o etc..

I wonder if is there any existing algorithms for such compression strings?

Will appreciate any kind of help to sort this out. Thanks!!

8
  • How can compressing and then comparing be faster than just comparing? Commented Mar 1, 2012 at 11:45
  • what is wrong with a naive compare? It doesn't use any more memory than that is already used by the two strings, and could be faster (potentially more accurate)... Commented Mar 1, 2012 at 11:46
  • @Jon: not faster but memory efficient as I was getting out of memory error for uncompressed string. Commented Mar 1, 2012 at 11:52
  • what if you just compare the two hashes of the strings? not faster than simply comparing the strings, though. Commented Mar 1, 2012 at 11:53
  • 1
    @Raj: The problem is that you are storing many strings in memory and then comparing. It sounds like you should not pre-process all of them and/or not buffer them in memory. If your code should do this but does not, compression will not fix your problem but merely postpone the day of reckoning. Commented Mar 1, 2012 at 12:04

1 Answer 1

1

You're looking for the run length encoding algorithm. You find some implementations here

Sign up to request clarification or add additional context in comments.

3 Comments

yes, but he has a strange "terminator" character, so he might need to modify the algorithm.
Thanks stacker. But in default implementation of RLE, token is a char but for me its set of char. And also there is no notion of terminating char in that algo.
I am not able to find any suitable compression algorithm meeting my expectation. So I am going to implement one which works for me.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.