1
$\begingroup$

I have fitted a regression tree on my data and would like to demonstrate that it is a good model. Are there any standard goodness of fit test or index for a regression tree?

I understand that I can calculate the Confusion Matrix or Gini's index etc. to assess performance of each node but are there any way to assess the fit of the whole tree?

$\endgroup$
3
  • $\begingroup$ Welcome to Cross Validated! Don’t you have the overall predictions from the whole tree that you can stick into the various calculations of interest like a confusion matrix or Gini index? // Are your observed outcomes discrete categories? $\endgroup$ Commented May 17, 2023 at 22:44
  • $\begingroup$ you mean comparing the model output and actual values? // the output is a continuous variable. $\endgroup$ Commented May 17, 2023 at 22:46
  • $\begingroup$ That’s the standard way to do it. $\endgroup$ Commented May 17, 2023 at 22:47

1 Answer 1

0
$\begingroup$

The standard way to proceed is to make predictions from the entire tree (or whatever model you are using). You then evaluate the predicted values compared to the true values in terms of some statistic(s) of interest, such as mean squared error for the regression tree you are using.

This statistic of interest is then your measure of performance for the entire tree, not just of an individual node.

There are many concerns downstream of this (discussed in the comments), and this is why methods like cross validation and bootstrap validation exist. However, all of them will begin with sending your features down the decision tree to get a prediction for the entire tree, just as you would do for any other supervised learning model.

$\endgroup$
1
  • $\begingroup$ Regression trees are notorious for non-reproducibility unless n > 100,000 subjects in many cases. The easiest way to invalidate a tree is to bootstrap your dataset and compare tree structures over multiple repeats of the tree-building process. You'll see a disappointing amount of instability in the trees that make their use questionable. Not to mention unreliability of predicted values. For more see hbiostat.org/rmsc/genreg.html $\endgroup$ Commented May 30, 2023 at 13:24

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.