I've built a decision tree for multi-class classification (MNIST). My question is, whenever I want to predict the label of a test pattern and I follow the tree depending on the values of the test pattern's attributes, what happens if there is not a left or right branch for a given test pattern attribute value? In tree building process I add branches on the left OR right depending on the training pattern attribute value I'm testing against a threshold.
1 Answer
What happens if there is not a left or right branch for a given test pattern attribute value?
This can only be the case if the instance arrives in a leaf node of the tree, in which case no further traversal of it is necessary and we classify the instance based on the class-conditional probability of the instance resulting from the class proportions in this node.
If the current node is not a leaf, the split operator is true for the given attribute value either on the right or the left, as for this purpose only $\geq$ (greater-or-equal) and $\leq$ (less-or-queal) are used.
Consider the example below, taken from page 306 of (1):
a two-dimensional data set is iteratively split by thresholds as shown in the top right plot, starting at $X_1$ where the first threshold $t_1$ is defined. In the plot on the bottom left we see the corresponding tree structure, where the first split is representing the top node $X_1 \leq t_1$. Any given $x_1 \in X_1$ is either less-or-equal than $t_1$, in which case it continues on the left, or greater than $t_1$, in which case it continues on the right. There is no middle ground here.
(1) Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. The elements of statistical learning. Vol. 1. No. 10. New York: Springer series in statistics, 2001.
-
$\begingroup$ I found out the algorith for binary classification. Can you point me to the mult-class case? $\endgroup$filippo portera– filippo portera2019-10-24 12:12:39 +00:00Commented Oct 24, 2019 at 12:12
-
$\begingroup$ What is unclear to you about the multiclass case? In a given leaf node, one uses the proportions of instances per clas that arrive there as estimates of the class-conditional probability - how one decides based on these is another matter entirely. For simple cases it might be enough assigning the class with the highest probability, while for more complex problems this might be suboptimal. $\endgroup$deemel– deemel2019-10-24 12:19:10 +00:00Commented Oct 24, 2019 at 12:19
