0
$\begingroup$
library(party)
irisct <- ctree(Species ~ ., data = iris)
print(irisct)
tb <- with(iris, table(cut(Petal.Length, c(0, 1.9, Inf)), Species))
print(tb)
print(chisq.test(tb))

enter image description here

How does the package get the "statistic = 140.264"? the chisq.test get "X-squared = 150", It's not the same. What does "criterion" mean?

$\endgroup$
2
  • $\begingroup$ I am not familiar with the method but I have no idea why you would expect this to match with a $\chi^2$? Page 4 of this vignette explains how the ctree statistic is calculated (the default is "quad"). $\endgroup$ Commented Apr 6, 2024 at 19:59
  • $\begingroup$ I cannot get to understand how the ctree statistic is calculated (the default is "quad") in Page 4 of this vignette ,Could someone please give a detailed calculation example? $\endgroup$ Commented Apr 11, 2024 at 12:20

1 Answer 1

1
$\begingroup$

Function ctree does not use $\chi^2$ tests, it uses conditional inference tests. These are implemented in package libcoin, functions LinStatExpCov and doTest. For more details, you can consult their help files, or Strasser and Weber (1999). As mentioned above, you can also consult the ctree vignette, or you can read Hothorn et al. (2006).

If you use ctree from package partykit instead of party, you can easily recover the test statistics computed for all potential partitioning variables in every terminal node. For example for the root node (node 1):

irisct <- partykit::ctree(Species ~ ., data = iris)
irisct[[1L]]$node$info$criterion
##           Sepal.Length   Sepal.Width  Petal.Length   Petal.Width
## statistic  9.218715e+01  5.971664e+01  1.402644e+02  1.384036e+02
## p.value    3.835958e-20  4.312762e-13  1.393271e-30  3.532723e-30
## criterion -3.835958e-20 -4.312762e-13 -1.393271e-30 -3.532723e-30

What ctree does internally to compute these statistics, for example for Sepal.Length in the first node, is as follows:

Y <- model.matrix( ~ Species, data = iris)
X <- iris$Sepal.Length
lev <- libcoin::LinStatExpCov(X = X, Y = Y)
tst <- libcoin::doTest(lev, teststat = "quadratic", pvalue = TRUE,
                       lower = TRUE, ordered = FALSE, log = TRUE,  
                       minbucket = 7)
tst$TestStatistic
## [1] 140.2644

The printed value of criterion is 1 minus the p-value for each splitting node, see also party::ctree_control. When using partykit::ctree, the misclassification error in each terminal node will be printed.

This probably does not answer all your questions, but hopefully provides some pointers.

References

Hothorn, T., Hornik, K., & Zeileis, A. (2006). Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical statistics, 15(3), 651-674.

Strasser, H. and Weber, C. (1999). On the asymptotic theory of permutation statistics. Mathematical Methods of Statistics 8(2), 220–250.

$\endgroup$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.