Introduction
As the body is O(1), or at least can be rewritten as O(1), we only have to look for how many time the function is called. So the time complexity of this algorithm will be how many times the function will be called in relation to length of input word and length of output prefix.
n - length of input word
k - length of prefix being searched for
I have never done something like this before and common methods for finding time complexity for recursive methods that I know of don't work in this case. I started off by looking how many calls to the function were made depending on n and k to see if I can spot any patterns that might help me.
Gathering data
Using this code snippet (sorry for ugly code):
public static String word = "abcdefghij";
public static int wordLength = word.length();
public static int limit = 10;
public static int access = 0;
System.out.printf("Word length : %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d\n",0,1,2,3,4,5,6,7,8,9,10);
System.out.printf("-----------------------------------------------------------------------------------------------\n");
for(int k = 0; k <= limit; k++) {
System.out.printf("k : %2d - results :", k);
for(int i = 0; i <= limit; i++) {
print("", word.substring(0,i), k);
System.out.printf(", %5d", access);
access=0;
}
System.out.print("\n");
}
print(prefix, remaining, k) {
access++;
... rest of code...
}
From this I got :
Word length : 0 1 2 3 4 5 6 7 8 9 10
-----------------------------------------------------------------------------------------------
k : 0 - results :, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
k : 1 - results :, 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21
k : 2 - results :, 1, 3, 7, 13, 21, 31, 43, 57, 73, 91, 111
k : 3 - results :, 1, 3, 7, 15, 29, 51, 83, 127, 185, 259, 351
k : 4 - results :, 1, 3, 7, 15, 31, 61, 113, 197, 325, 511, 771
k : 5 - results :, 1, 3, 7, 15, 31, 63, 125, 239, 437, 763, 1275
k : 6 - results :, 1, 3, 7, 15, 31, 63, 127, 253, 493, 931, 1695
k : 7 - results :, 1, 3, 7, 15, 31, 63, 127, 255, 509, 1003, 1935
k : 8 - results :, 1, 3, 7, 15, 31, 63, 127, 255, 511, 1021, 2025
k : 9 - results :, 1, 3, 7, 15, 31, 63, 127, 255, 511, 1023, 2045
k : 10 - results :, 1, 3, 7, 15, 31, 63, 127, 255, 511, 1023, 2047
At least one call is made in every case as it is the initial call. From here we see that everything on and below main diagonal is called 2min(n, k) + 1 - 1 times, like mentioned by others. That is number of nodes in binary tree.
Each time a the print method is called, two new ones will be called - making a binary tree.
Things get more confusing above the main diagonal though and I failed to see any common pattern.
Graph representation of algorithm
To make it more visual I used graphviz (online version).
Here is code snippet that generates code for given n and k for graphviz (green nodes are ones from where solution was found):
public static String word = "abcde";
public static int wordLength = word.length();
public static int limit = 3;
public static void main(String[] args) {
String rootNode = "\"prefix|remaining|k\"";
StringBuilder graph = new StringBuilder("digraph G { \n node [style=filled];");
print("", word, limit, graph, rootNode);
graph.append("\n\"prefix|remaining|k\" [shape=Mdiamond];\n}");
System.out.println(graph);
}
public static void print(String prefix, String remaining, int k, StringBuilder sb, String parent) {
String currentNode = "\"" + prefix + "|" + (remaining.isEmpty() ? 0 : remaining) + "|" + k + "\"";
sb.append("\n " + parent + "->" + currentNode + ";");
if(k == 0) {
sb.append("\n " + currentNode + "[color=darkolivegreen3];");
return;
}
if (remaining.length() == 0)return;
print(prefix + remaining.charAt(0), remaining.substring(1), k - 1, sb,currentNode);
print(prefix, remaining.substring(1), k, sb, currentNode);
}
Example of graph (n=5, k=3):
digraph G {
node [style=filled];
"prefix|remaining|k"->"|abcde|3";
"|abcde|3"->"a|bcde|2";
"a|bcde|2"->"ab|cde|1";
"ab|cde|1"->"abc|de|0";
"abc|de|0"[color=darkolivegreen3];
"ab|cde|1"->"ab|de|1";
"ab|de|1"->"abd|e|0";
"abd|e|0"[color=darkolivegreen3];
"ab|de|1"->"ab|e|1";
"ab|e|1"->"abe|0|0";
"abe|0|0"[color=darkolivegreen3];
"ab|e|1"->"ab|0|1";
"a|bcde|2"->"a|cde|2";
"a|cde|2"->"ac|de|1";
"ac|de|1"->"acd|e|0";
"acd|e|0"[color=darkolivegreen3];
"ac|de|1"->"ac|e|1";
"ac|e|1"->"ace|0|0";
"ace|0|0"[color=darkolivegreen3];
"ac|e|1"->"ac|0|1";
"a|cde|2"->"a|de|2";
"a|de|2"->"ad|e|1";
"ad|e|1"->"ade|0|0";
"ade|0|0"[color=darkolivegreen3];
"ad|e|1"->"ad|0|1";
"a|de|2"->"a|e|2";
"a|e|2"->"ae|0|1";
"a|e|2"->"a|0|2";
"|abcde|3"->"|bcde|3";
"|bcde|3"->"b|cde|2";
"b|cde|2"->"bc|de|1";
"bc|de|1"->"bcd|e|0";
"bcd|e|0"[color=darkolivegreen3];
"bc|de|1"->"bc|e|1";
"bc|e|1"->"bce|0|0";
"bce|0|0"[color=darkolivegreen3];
"bc|e|1"->"bc|0|1";
"b|cde|2"->"b|de|2";
"b|de|2"->"bd|e|1";
"bd|e|1"->"bde|0|0";
"bde|0|0"[color=darkolivegreen3];
"bd|e|1"->"bd|0|1";
"b|de|2"->"b|e|2";
"b|e|2"->"be|0|1";
"b|e|2"->"b|0|2";
"|bcde|3"->"|cde|3";
"|cde|3"->"c|de|2";
"c|de|2"->"cd|e|1";
"cd|e|1"->"cde|0|0";
"cde|0|0"[color=darkolivegreen3];
"cd|e|1"->"cd|0|1";
"c|de|2"->"c|e|2";
"c|e|2"->"ce|0|1";
"c|e|2"->"c|0|2";
"|cde|3"->"|de|3";
"|de|3"->"d|e|2";
"d|e|2"->"de|0|1";
"d|e|2"->"d|0|2";
"|de|3"->"|e|3";
"|e|3"->"e|0|2";
"|e|3"->"|0|3";
"prefix|remaining|k" [shape=Mdiamond];
}
Number of nodes cut from binary tree
From the example where n = 5 and k = 3 we can see that tree of height 3 and three trees of height 2 were cut off. As we visited each of these trees root nodes we get number of nodes cut from complete binary tree to be 1*(23 - 2) + 3*(22 - 2) = 12
If it were a full binary tree : 25 + 1 - 1 = 63
Number of nodes(calls made to function "print") then comes to 63 - 12 = 51
Result matches one we got by calculating number of calls made to function when n = 5 and k = 3.
Now we have to find how many and how big parts of tree are cut off for every n and k.
From here on I will refer to method call print(prefix + remaining.charAt(0), remaining.substring(1), k-1); as left path or left node (as it is in graphviz graphs) and to print(prefix, remaining.substring(1), k); as right path or right node.
We can see the first, and biggest tree is cut when we go left k times and the height of the tree will be n - k + 1. ( + 1 because we visit the root of the tree we cut).
We can see that every time we take left path k times we get to result no matter how many right paths we took before (or in what order). This is unless the word runs out of letters before we get the k left paths. So we can make maximum of n - k right turns.
Lets take a closer look at example where n = 5 and k = 3:
L - left path
R - right path
The first tree cut we took the paths :
LLL
The next highest trees that will be cut will be ones where we take only one right node, possible combinations are :
RLLL, LRLL, LLRL, LLLR -> three trees of height 2 cut
Here we must note that LLLR is already cut as LLL gave solution in previous step.
To get number of next trees (height 1 -> 0 nodes cut) we'll calculate the possible combinations of two rights and three lefts subtracting already visited paths.
Combinations(5,3) - Combinations(4,3) = 10 - 4 = 6 nodes of height 1
We can see the numbers match green nodes on the example graph.
C(n,k) - combinations of k from n
f(n,k) - number binary tree nodes not visited by algorithm
f(n,k) = (2n-k+1-2) + Σn-ki=1(2n-k-i+1-2)(C(k+i,k) - C(k+i-1,k))
Explanation:
- (2n-k+1-2) - the highest tree cut, have to bring it out of summation or we'll have to take negative factorials
- Σn-ki=1 - sum of all nodes cut, excluding highest tree as it is already added. (We start adding larger trees first)
- (2n-k-i+1-2) - number of nodes cut per tree. n-k+1 is the largest tree, then working down from there up to tree with height "n-k-(n-k)+1 = 1"
- (C(k+i,k) - C(k+i-1,k)) - find how many trees of given height there is. First find all possible paths (lefts and rights) and then subtract already visited ones(in previous steps).
Looks awful, but it can be simplified if we assume that k != 0 (If we don't assume that there will be factorials of negative numbers - which is undefined)
Simplified function:
f(n,k) = Σn-ki=0(2n-k-i+1-2)*C(k+i-1,k-1)
Evaluation of accurate time complexity
The time complexity of the function:
O(2n- Σn-ki=0(2n-k-i+1-2)*C(k+i-1,k-1))
Now this looks awful and doesn't give much information. I don't know how to simplify it any further. I've asked about it here. No answer so far though.
But is it even worth considering the f(n,k) part? Probably depends on particular application where it is applied. We can see from the data table that it can considerably affect the algorithms calls depending on the choice of n and k.
To see more visually how much the extra part affects complexity I plotted best time complexity and real complexity on graph.
O(2n- Σn-ki=0(2n-k-i+1-2)*C(k+i-1,k-1)) is the colorful surface.
B(2min(n,k)) is the green surface.
We can see that B(2min(n,k)) overestimates (tells it works much better than it actually does) the function complexity by quite much. It is usually useful to look algorithms worst case complexity which is W(2max(n,k))
O(2n- Σn-ki=0(2n-k-i+1-2)*C(k+i-1,k-1)) is the colorful surface.
B(2min(n,k)) is the green surface.
W(2max(n,k)) is the yellow surface.
Conclusion
- Best case complexity: B(2min(n,k))
- Accurate complexity : O(2n- Σn-ki=0(2n-k-i+1-2)*C(k+i-1,k-1))
- Worst case complexity: W(2max(n,k)) -> often noted as O(2max(n,k))
In my opinion worst case complexity should be used to evaluate the function as accurate is too complex to understand what it means without analyzing it further. I wouldn't use best case complexity because it leaves too much to chance. Unfortunately I can't calculate the average complexity for this. Depending on application of the algorithm, using average might be better for algorithm evaluation.
substring()using index manipulation (O(1) time) whereas new versions do it by copying (O(n) time).n choose khas O(n^2) complexity (stackoverflow.com/a/19416046/216248), and because of that, any lower degree complexity is considered irrelevant.("abcdef", 3)