Time Complexity of String Subsequence Recursion

Question

How to calculate the time complexity of the algorithm below? Can someone please explain to me briefly:

public static void print(String prefix, String remaining, int k) {
    if (k == 0) {
        StdOut.println(prefix);
        return;
    }
    if (remaining.length() == 0) return;
    print(prefix + remaining.charAt(0), remaining.substring(1), k-1);
    print(prefix, remaining.substring(1), k);
}


public static void main(String[] args) { 
    String s = "abcdef";
    int k = 3;
    print("", s, k);
}

Note: This code prints all length-k subsequences of the string. For example, ("abcdef", 6) yields "abc", "abd", "abe", "abf", "acd", "ace", etc., "def". We know by the binomial theorem that there are (n choose k) such output strings. — Nayuki
– Nayuki, Commented Jan 15, 2016 at 15:11
Also note: The time complexity of this algorithm depends on which version of Java you're using, because old versions performed substring() using index manipulation (O(1) time) whereas new versions do it by copying (O(n) time). — Nayuki
– Nayuki, Commented Jan 15, 2016 at 15:13
@NayukiMinase: then implementation doesn't matter here :-) The n choose k has O(n^2) complexity (stackoverflow.com/a/19416046/216248), and because of that, any lower degree complexity is considered irrelevant. — karatedog
– karatedog, Commented Jan 15, 2016 at 17:22

Ami Tavory · Accepted Answer · 2016-01-15 16:39:34Z

4

Suppose m is the length of prefix, and n is the length of remaining. Then the complexity is given by

T(m, n, k) = Θ(m + n) + T(m + 1, n - 1, k - 1) + T(m, n - 1, k).

The Θ(m + n) term stems from

prefix + remaining.charAt(0), remaining.substring(1)

which, in general will require creating two new strings of lengths about m and n, respectively (this might differ among various implementations).

Beyond that, it's pretty difficult to solve (at least for me), except for some very simple bounds. E.g., it's pretty clear that the complexity is at least exponential in the minimum of the length of the prefix and k, since

T(m, n, k) ≥ 2 T(m, n - 1, k - 1) ↠ T(m, n, k) = Ω(2^{min(n, k)}).

edited Jan 15, 2016 at 16:39

answered Jan 15, 2016 at 12:46

Ami Tavory

76.7k13 gold badges152 silver badges196 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

hk6279 Over a year ago

It should be T(m, n-1, k) instead of T(m, n, k - 1) in your first equation

Sarah Over a year ago

@AmiTavory Can u explain me what is Θ(m + n) ?

Sarah Over a year ago

@AmiTavory Thanks :) So we need to consider the time taken to do sub tring and concatenate too.. understood now.

albert · Accepted Answer · 2019-09-14 09:29:09Z

Introduction

As the body is O(1), or at least can be rewritten as O(1), we only have to look for how many time the function is called. So the time complexity of this algorithm will be how many times the function will be called in relation to length of input word and length of output prefix.
n - length of input word
k - length of prefix being searched for

I have never done something like this before and common methods for finding time complexity for recursive methods that I know of don't work in this case. I started off by looking how many calls to the function were made depending on n and k to see if I can spot any patterns that might help me.

Gathering data

Using this code snippet (sorry for ugly code):

public static String word = "abcdefghij";
public static int wordLength = word.length();
public static int limit = 10;
public static int access = 0;

System.out.printf("Word length      : %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d\n",0,1,2,3,4,5,6,7,8,9,10);
System.out.printf("-----------------------------------------------------------------------------------------------\n");
for(int k = 0; k <= limit; k++) {
    System.out.printf("k : %2d - results :", k);
        for(int i = 0; i <= limit; i++) {
        print("", word.substring(0,i), k);
        System.out.printf(", %5d", access);
        access=0;
    }
        System.out.print("\n");
}
print(prefix, remaining, k) {
    access++;
    ... rest of code...
}

From this I got :

Word length      :      0      1      2      3      4      5      6      7      8      9     10
-----------------------------------------------------------------------------------------------
k :  0 - results :,     1,     1,     1,     1,     1,     1,     1,     1,     1,     1,     1
k :  1 - results :,     1,     3,     5,     7,     9,    11,    13,    15,    17,    19,    21
k :  2 - results :,     1,     3,     7,    13,    21,    31,    43,    57,    73,    91,   111
k :  3 - results :,     1,     3,     7,    15,    29,    51,    83,   127,   185,   259,   351
k :  4 - results :,     1,     3,     7,    15,    31,    61,   113,   197,   325,   511,   771
k :  5 - results :,     1,     3,     7,    15,    31,    63,   125,   239,   437,   763,  1275
k :  6 - results :,     1,     3,     7,    15,    31,    63,   127,   253,   493,   931,  1695
k :  7 - results :,     1,     3,     7,    15,    31,    63,   127,   255,   509,  1003,  1935
k :  8 - results :,     1,     3,     7,    15,    31,    63,   127,   255,   511,  1021,  2025
k :  9 - results :,     1,     3,     7,    15,    31,    63,   127,   255,   511,  1023,  2045
k : 10 - results :,     1,     3,     7,    15,    31,    63,   127,   255,   511,  1023,  2047

At least one call is made in every case as it is the initial call. From here we see that everything on and below main diagonal is called 2^{min(n, k) + 1} - 1 times, like mentioned by others. That is number of nodes in binary tree. Each time a the print method is called, two new ones will be called - making a binary tree.

Things get more confusing above the main diagonal though and I failed to see any common pattern.

Graph representation of algorithm

To make it more visual I used graphviz (online version).

Here is code snippet that generates code for given n and k for graphviz (green nodes are ones from where solution was found):

public static String word = "abcde";
public static int wordLength = word.length();
public static int limit = 3;

public static void main(String[] args) {

    String rootNode = "\"prefix|remaining|k\"";
    StringBuilder graph = new StringBuilder("digraph G { \n node [style=filled];");
    print("", word, limit, graph, rootNode);
    graph.append("\n\"prefix|remaining|k\" [shape=Mdiamond];\n}");
    System.out.println(graph);
}

public static void print(String prefix, String remaining, int k, StringBuilder sb, String parent) {

    String currentNode =  "\"" + prefix + "|" + (remaining.isEmpty() ? 0 : remaining) +  "|" + k + "\"";
    sb.append("\n " + parent + "->" + currentNode + ";");

    if(k == 0) {
        sb.append("\n " + currentNode + "[color=darkolivegreen3];");
        return;
    }

    if (remaining.length() == 0)return;

    print(prefix + remaining.charAt(0), remaining.substring(1), k - 1, sb,currentNode);
    print(prefix, remaining.substring(1), k, sb, currentNode);
}

Example of graph (n=5, k=3):

digraph G { 
 node [style=filled];
 "prefix|remaining|k"->"|abcde|3";
 "|abcde|3"->"a|bcde|2";
 "a|bcde|2"->"ab|cde|1";
 "ab|cde|1"->"abc|de|0";
 "abc|de|0"[color=darkolivegreen3];
 "ab|cde|1"->"ab|de|1";
 "ab|de|1"->"abd|e|0";
 "abd|e|0"[color=darkolivegreen3];
 "ab|de|1"->"ab|e|1";
 "ab|e|1"->"abe|0|0";
 "abe|0|0"[color=darkolivegreen3];
 "ab|e|1"->"ab|0|1";
 "a|bcde|2"->"a|cde|2";
 "a|cde|2"->"ac|de|1";
 "ac|de|1"->"acd|e|0";
 "acd|e|0"[color=darkolivegreen3];
 "ac|de|1"->"ac|e|1";
 "ac|e|1"->"ace|0|0";
 "ace|0|0"[color=darkolivegreen3];
 "ac|e|1"->"ac|0|1";
 "a|cde|2"->"a|de|2";
 "a|de|2"->"ad|e|1";
 "ad|e|1"->"ade|0|0";
 "ade|0|0"[color=darkolivegreen3];
 "ad|e|1"->"ad|0|1";
 "a|de|2"->"a|e|2";
 "a|e|2"->"ae|0|1";
 "a|e|2"->"a|0|2";
 "|abcde|3"->"|bcde|3";
 "|bcde|3"->"b|cde|2";
 "b|cde|2"->"bc|de|1";
 "bc|de|1"->"bcd|e|0";
 "bcd|e|0"[color=darkolivegreen3];
 "bc|de|1"->"bc|e|1";
 "bc|e|1"->"bce|0|0";
 "bce|0|0"[color=darkolivegreen3];
 "bc|e|1"->"bc|0|1";
 "b|cde|2"->"b|de|2";
 "b|de|2"->"bd|e|1";
 "bd|e|1"->"bde|0|0";
 "bde|0|0"[color=darkolivegreen3];
 "bd|e|1"->"bd|0|1";
 "b|de|2"->"b|e|2";
 "b|e|2"->"be|0|1";
 "b|e|2"->"b|0|2";
 "|bcde|3"->"|cde|3";
 "|cde|3"->"c|de|2";
 "c|de|2"->"cd|e|1";
 "cd|e|1"->"cde|0|0";
 "cde|0|0"[color=darkolivegreen3];
 "cd|e|1"->"cd|0|1";
 "c|de|2"->"c|e|2";
 "c|e|2"->"ce|0|1";
 "c|e|2"->"c|0|2";
 "|cde|3"->"|de|3";
 "|de|3"->"d|e|2";
 "d|e|2"->"de|0|1";
 "d|e|2"->"d|0|2";
 "|de|3"->"|e|3";
 "|e|3"->"e|0|2";
 "|e|3"->"|0|3";
"prefix|remaining|k" [shape=Mdiamond];
}

Number of nodes cut from binary tree

From the example where n = 5 and k = 3 we can see that tree of height 3 and three trees of height 2 were cut off. As we visited each of these trees root nodes we get number of nodes cut from complete binary tree to be 1*(2³ - 2) + 3*(2² - 2) = 12
If it were a full binary tree : 2^{5 + 1} - 1 = 63
Number of nodes(calls made to function "print") then comes to 63 - 12 = 51
Result matches one we got by calculating number of calls made to function when n = 5 and k = 3.

Now we have to find how many and how big parts of tree are cut off for every n and k.

From here on I will refer to method call print(prefix + remaining.charAt(0), remaining.substring(1), k-1); as left path or left node (as it is in graphviz graphs) and to print(prefix, remaining.substring(1), k); as right path or right node.

We can see the first, and biggest tree is cut when we go left k times and the height of the tree will be n - k + 1. ( + 1 because we visit the root of the tree we cut).

We can see that every time we take left path k times we get to result no matter how many right paths we took before (or in what order). This is unless the word runs out of letters before we get the k left paths. So we can make maximum of n - k right turns.

Lets take a closer look at example where n = 5 and k = 3:

L - left path
R - right path

The first tree cut we took the paths :
LLL

The next highest trees that will be cut will be ones where we take only one right node, possible combinations are :
RLLL, LRLL, LLRL, LLLR -> three trees of height 2 cut
Here we must note that LLLR is already cut as LLL gave solution in previous step.

To get number of next trees (height 1 -> 0 nodes cut) we'll calculate the possible combinations of two rights and three lefts subtracting already visited paths. Combinations(5,3) - Combinations(4,3) = 10 - 4 = 6 nodes of height 1

We can see the numbers match green nodes on the example graph.

C(n,k) - combinations of k from n
f(n,k) - number binary tree nodes not visited by algorithm

f(n,k) = (2^n-k+1-2) + Σ^n-k_i=1(2^n-k-i+1-2)(C(k+i,k) - C(k+i-1,k))

Explanation:

(2^n-k+1-2) - the highest tree cut, have to bring it out of summation or we'll have to take negative factorials
Σ^n-k_i=1 - sum of all nodes cut, excluding highest tree as it is already added. (We start adding larger trees first)
(2^n-k-i+1-2) - number of nodes cut per tree. n-k+1 is the largest tree, then working down from there up to tree with height "n-k-(n-k)+1 = 1"
(C(k+i,k) - C(k+i-1,k)) - find how many trees of given height there is. First find all possible paths (lefts and rights) and then subtract already visited ones(in previous steps).

Looks awful, but it can be simplified if we assume that k != 0 (If we don't assume that there will be factorials of negative numbers - which is undefined)

Simplified function:
f(n,k) = Σ^n-k_i=0(2^n-k-i+1-2)*C(k+i-1,k-1)

Evaluation of accurate time complexity

The time complexity of the function:
O(2ⁿ- Σ^n-k_i=0(2^n-k-i+1-2)*C(k+i-1,k-1))

Now this looks awful and doesn't give much information. I don't know how to simplify it any further. I've asked about it here. No answer so far though.

But is it even worth considering the f(n,k) part? Probably depends on particular application where it is applied. We can see from the data table that it can considerably affect the algorithms calls depending on the choice of n and k.

To see more visually how much the extra part affects complexity I plotted best time complexity and real complexity on graph.

O(2ⁿ- Σ^n-k_i=0(2^n-k-i+1-2)*C(k+i-1,k-1)) is the colorful surface.
B(2^min(n,k)) is the green surface.

We can see that B(2^min(n,k)) overestimates (tells it works much better than it actually does) the function complexity by quite much. It is usually useful to look algorithms worst case complexity which is W(2^max(n,k))

O(2ⁿ- Σ^n-k_i=0(2^n-k-i+1-2)*C(k+i-1,k-1)) is the colorful surface.
B(2^min(n,k)) is the green surface.
W(2^max(n,k)) is the yellow surface.

Conclusion

Best case complexity: B(2^min(n,k))
Accurate complexity : O(2ⁿ- Σ^n-k_i=0(2^n-k-i+1-2)*C(k+i-1,k-1))
Worst case complexity: W(2^max(n,k)) -> often noted as O(2^max(n,k))

In my opinion worst case complexity should be used to evaluate the function as accurate is too complex to understand what it means without analyzing it further. I wouldn't use best case complexity because it leaves too much to chance. Unfortunately I can't calculate the average complexity for this. Depending on application of the algorithm, using average might be better for algorithm evaluation.

Community · Accepted Answer · 2020-06-20 09:12:55Z

0

Suppose m is the length of prefix, and n is the length of remaining. Then the complexity is given by

T(m, n, k) = 1 + n + 1 + T(m + 1, n - 1, k - 1) + T(m, n - 1, k)

Obviously, the function stop when n=0 or k=0. So,

T(r, n, 0) = 1 + r

T(m, 0, k) = 1 + 1 + 1 = 3

Reform equation 1, we got

T(m, n, k) - T(m, n - 1, k) = 2 + n + T(m + 1, n - 1, k - 1)

Replace n by n-1 in equation 1

T(m, n - 1, k) - T(m, n - 2, k) = 2 + (n - 1) + T(m + 1, n - 2, k - 1)

... continue ...

T(m, 1, k) - T(m, 0, k) = 2 + (1) + T(m + 1, 0, k - 1)

Sum them up

T(m, n, k) - T(m, 0, k) = 2(n) + (n-1)(n)/2 + {Summation of a from 0 to n - 1 on T(m + 1, a, k - 1)}

Reform

T(m, n, k) = n²/2 + 3n/2 +3 + {Summation of a from 0 to n - 1 on T(m + 1, a, k - 1)}

I guess we can get the answer by solving the Summation by using the last equation and the leading factor of the equation would be something like n^k+1

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Jan 15, 2016 at 14:53

hk6279

1,8792 gold badges22 silver badges33 bronze badges

2 Comments

Sarah Over a year ago

I understood these : T(m + 1, n - 1, k - 1) + T(m, n - 1, k) but what is 1 + n + 1 ? here 1+ 1 = 2 if conditions ? Please can u explain me that

hk6279 Over a year ago

@Sarah 1 + n + 1 is time taken for (k==0) + extract remaining.length() + (remaining.length() == 0).

Collectives™ on Stack Overflow

Time Complexity of String Subsequence Recursion

3 Answers 3

3 Comments

Introduction

Gathering data

Graph representation of algorithm

Number of nodes cut from binary tree

Evaluation of accurate time complexity

Conclusion

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Introduction

Gathering data

Graph representation of algorithm

Number of nodes cut from binary tree

Evaluation of accurate time complexity

Conclusion

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related