Algorithm to find pattern based string match

Question

There are 2 input string r, s. The algorithm checks if there is a match between them: every char in r matches only one non - empty substring in s. And different chars in r match different substrings in s. For example if r = "ABA" and s = "hibiehi". there is a match A = "hi", B = "bie" But if r = "ABA" and s = "hibie" they don`t match.

I'm assuming that a letter in the pattern can't match an empty string, but can different letters match the same string? For example, does AAB match hihihi? — kcsquared
– kcsquared, Commented Feb 6, 2022 at 10:22
@kcsquared each latter matches non-empty string, different letters match different strings — YAKOVM
– YAKOVM, Commented Feb 6, 2022 at 10:39

kcsquared · Accepted Answer · 2022-02-06 11:15:09Z

You can do this with depth-first search, also called backtracking. We can greedily try to match each letter of the pattern to substrings of the word, and backtrack upon failure. Keep track of the previous matches (and inverse matches) using a hashmap.

The running time of this approach is exponential (in the pattern length, at least), although I haven't analyzed it exactly. You can speed this up with early stopping/pruning the search tree, but it would take a new idea to get a fully polynomial runtime.

Python implementation:

def word_pattern(pattern: str, word: str) -> bool:
    """Given a pattern string, check if 'word' can match pattern.
    'Match' here means a bijection between each character in pattern
    with a nonempty substring in word.
    """

    pattern_len = len(pattern)
    word_len = len(word)

    def dfs(biject_pat_to_word: Dict[str, str],
            already_used_strs: Set[str],
            pattern_index: int,
            word_index: int) -> bool:
        """Greedily try to match each pattern character to the shortest
        possible substring of word, consistent with current bijection.
        Return whether this was possible."""

        if pattern_index == pattern_len:  # Reached pattern end
            return word_index == word_len
        next_letter = pattern[pattern_index]

        # If we've already seen this pattern char, it must match again
        if next_letter in biject_pat_to_word:
            pat_match = biject_pat_to_word[next_letter]
            if word[word_index:word_index + len(pat_match)] != pat_match:
                return False

            word_index += len(pat_match)
            pattern_index += 1

            return dfs(biject_pat_to_word, already_used_strs,
                       pattern_index, word_index)

        curr_str_match = ''
        for amount_to_take in range(1, word_len - word_index + 1):
            curr_str_match += word[word_index + amount_to_take - 1]
            if curr_str_match in already_used_strs:
                continue

            biject_pat_to_word[next_letter] = curr_str_match
            already_used_strs.add(curr_str_match)

            # Try to use this pattern
            if dfs(biject_pat_to_word, already_used_strs,
                   pattern_index=pattern_index + 1,
                   word_index=word_index + amount_to_take):
                return True

            already_used_strs.discard(curr_str_match)

        # If we've set which string we match, unset it for future calls
        if next_letter in biject_pat_to_word:
            del biject_pat_to_word[next_letter]

        return False

    return dfs(biject_pat_to_word={},
               already_used_strs=set(),
               pattern_index=0, word_index=0)

Collectives™ on Stack Overflow

Algorithm to find pattern based string match

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related