substring searching in a string using suffix tree..?

Question

I have read that :

Searching for a substring, pat[1..m], in txt[1..n], can be solved in O(m) time (after the suffix tree for txt has been built in O(n) time).

but at each point, we will have to choose which branch to take, so like in n-ary tree, at each node, we will have to compare with all max n pointers in that node to decide which branch to take. Will this not bring n factor in complexity of this algorithm, somehow in picture

Then how above it says that substring can be found in O(m)?

What am I missing here?

How is suffix tree represented in practice? could we represent it using adjacency list as in graph? — xyz
– xyz, Commented Jun 8, 2011 at 9:24

Konrad Rudolph · Accepted Answer · 2011-06-08 09:22:41Z

5

Then how above it says that substring can be found in O(m)?

By omission. You are correct that the runtime of searching in suffix trees is more complex than merely O(m).

However, it can indeed be sped up to O(m) if we trade off space requirements: we need to get the search at each node down to O(1) and we can do this by using an appropriate data structure (e.g. an array) which gives us the appropriate sub-tree for each letter in constant time.

For instance, assume that you’re using C++ for the implementation and your character (char) can contain 256 different values. Then the implementation of a node could look as follows:

struct node {
    char current_character;
    node* children[256];
};

Now, current_character refers to the character of the branch leading to the current node, and children is an array of child nodes. During the search, assume that you are currently at node u, and the next character in the input text is c. Then you will choose the next node as follows:

node* next = u->children[c];
if (next == 0) {
    // Child node does not exist => nothing found.
}
else {
    u = next;
    // Continue search with next …
}

Of course, this is only viable for very small alphabets (e.g. DNA for genomic sequences). In most common cases, the worst-case runtime of a suffix tree is indeed higher than O(m).

edited Jun 8, 2011 at 9:22

answered Jun 8, 2011 at 9:14

Konrad Rudolph

549k142 gold badges967 silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

xyz Over a year ago

ok, any pointers to actual implementation of suffix trees? Please also look at my comment to other answer above.

Michy · Accepted Answer · 2011-06-08 09:12:57Z

0

If pointers to childs are in array indexed by the letter, only constant time is needed for each pattern letter

node = tree root
FOR i in 1..m
   node = child[pat[i]]

so the complexity is O(m).

answered Jun 8, 2011 at 9:12

Michy

6339 silver badges15 bronze badges

3 Comments

xyz Over a year ago

ok, I am not aware of how suffix trees are implemented in practice. but how do you actually keep an array indexible on characters? what is space complexity of this? there are 26 lower case alphabet characters. so with each node, do we keep an array of size 26*pointers. That sound too much wastage of space..?

Michy Over a year ago

In the other approach you can have a list of pointers in each node, so the time in one node is O(A) (A is alphabet size) and the time for finding substring is O(m * A). Assuming A as the constant we get O(m) again.

zero_cool Over a year ago

Is there a clear functional implementation, and description of this online. I'm surprised google searches have not been fruitful. Beyond the theory, how do you implement a suffix tree in javascript, and search for matching substrings, so that the closest result is returned? The purpose is an autocomplete field. C'mon gods of javascript.

Collectives™ on Stack Overflow

substring searching in a string using suffix tree..?

2 Answers 2

1 Comment

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related