Newest 'cart' Questions

0 votes

2 answers

52 views

How to investigate if my poor classification is because of bad data or some other reason [duplicate]

I currently have a RandomForestClassifier that is classifying workload based on fNIRS data. Our classification accuracy is about 49% I want to investigate why our classification accuracy is so bad and ...

Maddie Brower

1

asked Nov 17 at 19:09

0 votes

0 answers

54 views

MSE Loss: Which target representation allows better focus on minority class learning?

Given these two target representations for the same underlying data: Target A : Minority class samples (Cluster 5) isolated in distribution tail, majority class samples (Clusters 3+6) shifted toward ...

n0rdp0l

1

asked Aug 5 at 18:26

0 votes

0 answers

59 views

feature weights vs leaf weights in xgboost (L1 and L2 regularization)

This article mentions "feature weights" several times: https://xgboosting.com/xgboost-regularization-techniques/ However, it's not clear to me how a tree can have feature weights? It's not ...

Baron Yugovich

509

asked Jun 14 at 14:45

0 votes

0 answers

83 views

Log-rank test statistic for multi-way splitting in survival trees

In survival trees, the log-rank test statistic is most often used for split selection. This involves selecting the binary split that produces the largest value of the log-rank test statistic. This ...

user3298179

123

asked Oct 31, 2024 at 20:27

1 vote

1 answer

427 views

rpart() decision tree fails to generate splits (decision tree with only one node (the root node))

I'm trying to create a decision tree to predict whether a given loan applicant would default or repay their debt. I'm using the following dataset ...

OzkanGelincik

23

asked Jul 28, 2024 at 22:55

2 votes

1 answer

223 views

On the History of Gradient Boosting

I have recently done some work altering popular gradient boosted decision trees (GBDTs) for regression, and I was just working on establishing a theoretic basis for the modern algorithm. There is a ...

jeffery_the_wind

457

asked May 29, 2024 at 16:03

2 votes

2 answers

167 views

Is feature importance given by decision tree universal?

I'm wondering that if I have a set of features on a fitted classification decision tree with relative low feature importance, would it mean that these features would also be negligible when fitted ...

bachts

21

asked May 3, 2024 at 8:04

1 vote

0 answers

74 views

XGB predict_proba estimates don't match sum of leaves [closed]

When using an XGB model in the context of binary classification, I observed that the test estimates given by predict_proba were close but not equal to the results I ...

Juan Felipe Salamanca Lozano

29

asked Apr 15, 2024 at 17:13

0 votes

1 answer

66 views

How to calculate the statistic for ctree function？

...

Apai

19

asked Apr 6, 2024 at 14:45

3 votes

1 answer

329 views

Can you explain this description of tree pruning in Intro to Statistical Learning?

The underlined sentences below from p. 331 in An Introduction to Statistical Learning have me scratching my head: Given that the splitting algorithm always finds the best next split in terms of error ...

Jon

191

asked Mar 22, 2024 at 4:15

1 vote

1 answer

594 views

How does Cross Validation work in decision trees (or tree ensembles)

I've been working with tree-based models for a long time and I never really asked myself how cross-validation would work when building a tree. For the sake of this question, suppose I've split my ...

Arturo Sbr

609

asked Feb 27, 2024 at 15:40

1 vote

0 answers

192 views

Why does feature importance decrease for highly correlated variables?

I am investigating the relationship between correlation between features and its impact on their feature importances using sklearn's DecisionTreeClassifier algorithm. I manipulated the correlation of ...

AvanishM

31

asked Feb 13, 2024 at 17:07

2 votes

2 answers

170 views

Why does my neural network consider different features important compared to my decision tree?

I built a neural network and a decision tree using very similar data sets (the only difference was the randomness of selecting the training vs testing set). The variables with the highest shapley ...

Jay

41

asked Jan 5, 2024 at 10:39

0 votes

2 answers

167 views

Classification and regression trees splitting depth - how it works?

I am trying to understand how a CART tree grows, So I am growing a tree step by step, and I am finding a strange (?) behavior. Let me show this by means of an example: I will use the titanic data set ...

Nicolas Molano

97

asked Dec 7, 2023 at 16:21

2 votes

1 answer

609 views

Gini impurity greedily optimises a loss function in decision trees

I am trying to understand how the Gini criterion for decision decision tree construction actually greedily optimises a loss function. The Gini impurity, sometimes also called Gini index, for a region (...

ngmir

339

asked Oct 9, 2023 at 12:15

1 vote

0 answers

734 views

Monotone constraints in decision tree regressor or random forest regression

after I've spent several weeks trying to fit a regression model to my flood damage data (x1=water height, x2=adaptation height, x3=(x1-x2), y=damage), it is now time for my very first question on ...

Sjafnargata

21

asked Sep 15, 2023 at 15:43

1 vote

0 answers

170 views

What is the splitting criterion in Regression trees (DecisionTreeRegressor sklearn) in the multi output case

I am using DecisionTreeRegressor and RandomForestRegressor from sklearn in a case where i have multiple output, but i did not find a reference article for the regression case (which is used by sklearn)...

Rayane Elimam

11

asked Sep 7, 2023 at 11:52

0 votes

2 answers

2k views

Difference between max_depth and max_leaf_nodes in decision tree classifier (sklearn)

What is the difference between max_depth and max_leaf_nodes parameter in decision tree classifier. if depth is 4, then the number of leaf nodes will be 2^4 = 16. So providing max_depth = 4 or ...

amtn

1

asked Aug 7, 2023 at 16:10

3 votes

1 answer

268 views

Why cannot a single decision tree represent an entire Random Forest?

I was intrigued by the reply from @JohnRos to the post Making a single decision tree from a random forest. They say "<...> a random forest prediction cannot be represented by a single tree....

Smerdjakov

155

asked Jul 28, 2023 at 8:30

0 votes

0 answers

61 views

2 or more continious features in Tree classification

If a training set has a continious feature, some texts recommend that first the dataset is sorted based on the continious feature, and then split points are chosen. What I am not sure about, is how ...

Karl 17302

1

asked Jul 20, 2023 at 17:21

4 votes

0 answers

53 views

Is there a decision tree impurity metric based on maximum of probabilities?

I'm trying to understand impurity metrics in decision tree learning, in particular the Gini impurity. Questioning one of the assumptions of Gini impurity has led me to another impurity measure which ...

A Kubiesa

171

asked Jun 11, 2023 at 10:11

1 vote

1 answer

244 views

Goodness of fit test/index for a regression tree

I have fitted a regression tree on my data and would like to demonstrate that it is a good model. Are there any standard goodness of fit test or index for a regression tree? I understand that I can ...

Santanu

121

asked May 17, 2023 at 22:42

2 votes

1 answer

8k views

How to use categorical features in lightGBM? [closed]

I am working on an attrition dataset which has a large number of categorical parameters. Each categorical parameter has a high cardinality, so one-hot encoding them is out of question. I was looking ...

Ashish Samant

23

asked May 4, 2023 at 7:05

2 votes

0 answers

61 views

Can you use regression trees for classification tasks in random forest?

I've been playing around with random forest algorithm to classify a binary Y vector using classification and regression trees. Classification trees output class probability and regression trees an ...

Mirko Pavicic

93

asked Apr 23, 2023 at 20:43

2 votes

1 answer

104 views

How to find AUC from Binary Classification Decision Tree?

Decision Tree I have found Misclassification rates for all the leaf nodes. samples = 3635 + 1101 = 4736, class = Cash, misclassification rate = 1101 / 4736 = 0.232. samples = 47436 + 44556 = 91992, ...

Aman Rangapur

23

asked Apr 9, 2023 at 19:58

Stack Exchange Network

Questions tagged [cart]

How to investigate if my poor classification is because of bad data or some other reason [duplicate]

MSE Loss: Which target representation allows better focus on minority class learning?

feature weights vs leaf weights in xgboost (L1 and L2 regularization)

Log-rank test statistic for multi-way splitting in survival trees

rpart() decision tree fails to generate splits (decision tree with only one node (the root node))

On the History of Gradient Boosting

Is feature importance given by decision tree universal?

XGB predict_proba estimates don't match sum of leaves [closed]

How to calculate the statistic for ctree function？

Can you explain this description of tree pruning in Intro to Statistical Learning?

How does Cross Validation work in decision trees (or tree ensembles)

Why does feature importance decrease for highly correlated variables?

Why does my neural network consider different features important compared to my decision tree?

Classification and regression trees splitting depth - how it works?

Gini impurity greedily optimises a loss function in decision trees

Monotone constraints in decision tree regressor or random forest regression

What is the splitting criterion in Regression trees (DecisionTreeRegressor sklearn) in the multi output case

Difference between max_depth and max_leaf_nodes in decision tree classifier (sklearn)

Why cannot a single decision tree represent an entire Random Forest?

2 or more continious features in Tree classification

Is there a decision tree impurity metric based on maximum of probabilities?

Goodness of fit test/index for a regression tree

How to use categorical features in lightGBM? [closed]

Can you use regression trees for classification tasks in random forest?

How to find AUC from Binary Classification Decision Tree?

Hot Network Questions