Goodness of fit test/index for a regression tree

Question

I have fitted a regression tree on my data and would like to demonstrate that it is a good model. Are there any standard goodness of fit test or index for a regression tree?

I understand that I can calculate the Confusion Matrix or Gini's index etc. to assess performance of each node but are there any way to assess the fit of the whole tree?

Welcome to Cross Validated! Don’t you have the overall predictions from the whole tree that you can stick into the various calculations of interest like a confusion matrix or Gini index? // Are your observed outcomes discrete categories? — Dave
– Dave, Commented May 17, 2023 at 22:44
you mean comparing the model output and actual values? // the output is a continuous variable. — Santanu
– Santanu, Commented May 17, 2023 at 22:46

Dave · Accepted Answer · 2023-05-30 23:40:22Z

0

The standard way to proceed is to make predictions from the entire tree (or whatever model you are using). You then evaluate the predicted values compared to the true values in terms of some statistic(s) of interest, such as mean squared error for the regression tree you are using.

This statistic of interest is then your measure of performance for the entire tree, not just of an individual node.

There are many concerns downstream of this (discussed in the comments), and this is why methods like cross validation and bootstrap validation exist. However, all of them will begin with sending your features down the decision tree to get a prediction for the entire tree, just as you would do for any other supervised learning model.

edited May 30, 2023 at 23:40

answered May 30, 2023 at 10:47

Dave

72.8k8 gold badges116 silver badges359 bronze badges

$\begingroup$ Regression trees are notorious for non-reproducibility unless n > 100,000 subjects in many cases. The easiest way to invalidate a tree is to bootstrap your dataset and compare tree structures over multiple repeats of the tree-building process. You'll see a disappointing amount of instability in the trees that make their use questionable. Not to mention unreliability of predicted values. For more see hbiostat.org/rmsc/genreg.html $\endgroup$

Frank Harrell
– Frank Harrell

2023-05-30 13:24:45 +00:00
Commented May 30, 2023 at 13:24

Add a comment |

Stack Exchange Network

Goodness of fit test/index for a regression tree

1 Answer 1

Your Answer

Linked

Hot Network Questions

Goodness of fit test/index for a regression tree

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Hot Network Questions