How to calculate the statistic for ctree function？

Question

library(party)
irisct <- ctree(Species ~ ., data = iris)
print(irisct)
tb <- with(iris, table(cut(Petal.Length, c(0, 1.9, Inf)), Species))
print(tb)
print(chisq.test(tb))

How does the package get the "statistic = 140.264"? the chisq.test get "X-squared = 150", It's not the same. What does "criterion" mean?

I am not familiar with the method but I have no idea why you would expect this to match with a $\chi^2$? Page 4 of this vignette explains how the ctree statistic is calculated (the default is "quad"). — PBulls
– PBulls, Commented Apr 6, 2024 at 19:59
I cannot get to understand how the ctree statistic is calculated (the default is "quad") in Page 4 of this vignette ，Could someone please give a detailed calculation example？ — Apai
– Apai, Commented Apr 11, 2024 at 12:20

Marjolein Fokkema · Accepted Answer · 2024-06-25 22:22:12Z

Function ctree does not use $\chi^2$ tests, it uses conditional inference tests. These are implemented in package libcoin, functions LinStatExpCov and doTest. For more details, you can consult their help files, or Strasser and Weber (1999). As mentioned above, you can also consult the ctree vignette, or you can read Hothorn et al. (2006).

If you use ctree from package partykit instead of party, you can easily recover the test statistics computed for all potential partitioning variables in every terminal node. For example for the root node (node 1):

irisct <- partykit::ctree(Species ~ ., data = iris)
irisct[[1L]]$node$info$criterion
##           Sepal.Length   Sepal.Width  Petal.Length   Petal.Width
## statistic  9.218715e+01  5.971664e+01  1.402644e+02  1.384036e+02
## p.value    3.835958e-20  4.312762e-13  1.393271e-30  3.532723e-30
## criterion -3.835958e-20 -4.312762e-13 -1.393271e-30 -3.532723e-30

What ctree does internally to compute these statistics, for example for Sepal.Length in the first node, is as follows:

Y <- model.matrix( ~ Species, data = iris)
X <- iris$Sepal.Length
lev <- libcoin::LinStatExpCov(X = X, Y = Y)
tst <- libcoin::doTest(lev, teststat = "quadratic", pvalue = TRUE,
                       lower = TRUE, ordered = FALSE, log = TRUE,  
                       minbucket = 7)
tst$TestStatistic
## [1] 140.2644

The printed value of criterion is 1 minus the p-value for each splitting node, see also party::ctree_control. When using partykit::ctree, the misclassification error in each terminal node will be printed.

This probably does not answer all your questions, but hopefully provides some pointers.

References

Hothorn, T., Hornik, K., & Zeileis, A. (2006). Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical statistics, 15(3), 651-674.

Strasser, H. and Weber, C. (1999). On the asymptotic theory of permutation statistics. Mathematical Methods of Statistics 8(2), 220–250.

Stack Exchange Network

How to calculate the statistic for ctree function？

1 Answer 1

References

Your Answer

Hot Network Questions

How to calculate the statistic for ctree function？

1 Answer 1

References

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions