8

I'm looking for a short, simple suffix tree building/usage algorithm in Java. The best I've found so far lies withing the Semantic Discovery Toolkit, but the implementation is several thousand lines long and spans several classes. Ideally, the implementation would be as short as possible and span no more than a few hundred lines.

Does anyone have such an implementation?

2
  • no, but i wrote one in ruby a while back. you should probably just write it yourself if you want a short implementation... char[] c = string.toCharArray(); for(int i=c.length-1; i>=0; i++) recurse(c[i])... Commented Jan 11, 2010 at 15:47
  • Post it as an answer so I can upvote it. I just need something that fits on a sheet of paper that I can reference easily. Shortly, I will need to be able to produce a number of algorithms with minimal documentation, so short implementations are good implementations. Commented Jan 11, 2010 at 22:36

3 Answers 3

5

I just finished a Java implementation of a suffix tree. In my blog entry you can find out more about suffix trees, see how to use my library, as well as download and build the library using Subversion and Maven. Yes, it's longer than just a few lines in a single class file, but it is highly documented and is created for use in the real world for practical purposes. In addition, it uses the Ukkonen approach for linear time construction. (Most of the implementations noted here have at least O(n^2) running time.)

Sign up to request clarification or add additional context in comments.

1 Comment

+1 Although the OP did not specify scalability/performance as criteria, those are nearly always for me; therefore, it is important to get linear time - and thus Uknonnen's approach. When including those criteria, this is a quality answer.
1

The article "Simple Linear Work Suffix Array Construction", by Karkkainen and Sanders, terminates with 50 lines of C++. You will probably also want something to produce the LCP array. Googling for "Computing the LCP array in linear time, given S and the suffix array POS." should find you that.

Comments

0

You can also take mine but this is not Ukkonen's algorithm - as all other simple approaches, it runs in quadratic time. I agree that a naive algorithm (that may work ok for the shorter sequences) is easy to write in half a day at most.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.