3

I have read that :

Searching for a substring, pat[1..m], in txt[1..n], can be solved in O(m) time (after the suffix tree for txt has been built in O(n) time).

but at each point, we will have to choose which branch to take, so like in n-ary tree, at each node, we will have to compare with all max n pointers in that node to decide which branch to take. Will this not bring n factor in complexity of this algorithm, somehow in picture

Then how above it says that substring can be found in O(m)?

What am I missing here?

1
  • How is suffix tree represented in practice? could we represent it using adjacency list as in graph? Commented Jun 8, 2011 at 9:24

2 Answers 2

5

Then how above it says that substring can be found in O(m)?

By omission. You are correct that the runtime of searching in suffix trees is more complex than merely O(m).

However, it can indeed be sped up to O(m) if we trade off space requirements: we need to get the search at each node down to O(1) and we can do this by using an appropriate data structure (e.g. an array) which gives us the appropriate sub-tree for each letter in constant time.

For instance, assume that you’re using C++ for the implementation and your character (char) can contain 256 different values. Then the implementation of a node could look as follows:

struct node {
    char current_character;
    node* children[256];
};

Now, current_character refers to the character of the branch leading to the current node, and children is an array of child nodes. During the search, assume that you are currently at node u, and the next character in the input text is c. Then you will choose the next node as follows:

node* next = u->children[c];
if (next == 0) {
    // Child node does not exist => nothing found.
}
else {
    u = next;
    // Continue search with next …
}

Of course, this is only viable for very small alphabets (e.g. DNA for genomic sequences). In most common cases, the worst-case runtime of a suffix tree is indeed higher than O(m).

Sign up to request clarification or add additional context in comments.

1 Comment

ok, any pointers to actual implementation of suffix trees? Please also look at my comment to other answer above.
0

If pointers to childs are in array indexed by the letter, only constant time is needed for each pattern letter

node = tree root
FOR i in 1..m
   node = child[pat[i]]

so the complexity is O(m).

3 Comments

ok, I am not aware of how suffix trees are implemented in practice. but how do you actually keep an array indexible on characters? what is space complexity of this? there are 26 lower case alphabet characters. so with each node, do we keep an array of size 26*pointers. That sound too much wastage of space..?
In the other approach you can have a list of pointers in each node, so the time in one node is O(A) (A is alphabet size) and the time for finding substring is O(m * A). Assuming A as the constant we get O(m) again.
Is there a clear functional implementation, and description of this online. I'm surprised google searches have not been fruitful. Beyond the theory, how do you implement a suffix tree in javascript, and search for matching substrings, so that the closest result is returned? The purpose is an autocomplete field. C'mon gods of javascript.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.