3

What is a simple way to implement a find/replace algorithm on a string? I would like to transform a string using dictionary which defines replacement rules. The problem is that after each replacement, I must make sure that subsequent replacements operate on the original string. For example:

My string is: ABCABCDEFDEF

My rules are: ABC -> DEF and DEF -> XXX

So my result should be: DEFDEFXXXXXX and not XXXXXXXXXXXX (which would be the result if I were to first apply rule one and then apply rule two).

4
  • 1
    For each rule, find indices where it matches the text. Then apply the rule? (There must be some faster way I think) Commented Jun 30, 2014 at 7:19
  • Well, after each rule I apply, my indices are going to change, aren't they? Commented Jun 30, 2014 at 7:24
  • Traverse the entire string once and record the occurrence of the required substrings in a list or similar data structure. then iterate through the list and keep replacing! Commented Jun 30, 2014 at 7:25
  • Create a Finite State Machine. This can be done by hand (it needs only 5 states) or by using a generator such as (f)lex. Commented Jun 30, 2014 at 7:42

2 Answers 2

3

Simple way:

  • Starting at the first character, try each key if it occurs at that position.

  • If you find a match, replace and continue with the character after the replacement

  • Otherwise, continue at the next character

To kep in mind:

  • Ambiguities: If you have both "AB" and "ABC" as keys, you need to decide which should match "ABCD". Usually you want the longer string to match (otherwise, it would never match)

  • Unicode: Normalize keys and original string first.

That is certainly sufficient for a handful of keys. However, it's O(N*M) where N is the string length and M is the number of replacements.


Improvements:

  • do not search linearly for a match; instead use a sorted list of keys and do a binary search for the character in the original string, then the next etc. Indeed, it might be beneficial to only remember the position and key of found matches in the first pass, and do the replacements in a second pass

  • for large strings with many replacements, it is usually better to build a new string

  • Use Aho-Corasick for search. This exploits the limited search spacce (i.e knowledge derived from the list of keywords) to avoid probing every character of the source string

Sign up to request clarification or add additional context in comments.

Comments

0

Depending on the language you are using there might be so predesigned functions. String.Replace might be helpful if you are using C#. This would save you very much time. If you are still looking for algorithms which can find patterns within other strings, the Horspool-algorithm might be what you are looking for.

You would still have to implement the logic for subsequent replacements to operate on the original string. But this sounds like not a hard thing to do.

2 Comments

Yes, I can use functions like String.Replace, but I can't figure out how to apply each rule independently.
Well you could save the indices of the substrings you already changed. Next time you traverse the string for another pattern you could look up if the substring has already been changed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.