Questions tagged [cart]
'Classification And Regression Trees', also sometimes called 'decision trees'. CART is a popular machine learning technique, and it forms the basis for techniques like random forests and common implementations of gradient boosting machines.
1,275 questions
0
votes
2
answers
52
views
How to investigate if my poor classification is because of bad data or some other reason [duplicate]
I currently have a RandomForestClassifier that is classifying workload based on fNIRS data. Our classification accuracy is about 49% I want to investigate why our classification accuracy is so bad and ...
0
votes
0
answers
54
views
MSE Loss: Which target representation allows better focus on minority class learning?
Given these two target representations for the same underlying data:
Target A : Minority class samples (Cluster 5) isolated in distribution tail, majority class samples (Clusters 3+6) shifted toward ...
0
votes
0
answers
59
views
feature weights vs leaf weights in xgboost (L1 and L2 regularization)
This article mentions "feature weights" several times: https://xgboosting.com/xgboost-regularization-techniques/
However, it's not clear to me how a tree can have feature weights? It's not ...
0
votes
0
answers
83
views
Log-rank test statistic for multi-way splitting in survival trees
In survival trees, the log-rank test statistic is most often used for split selection. This involves selecting the binary split that produces the largest value of the log-rank test statistic.
This ...
1
vote
1
answer
427
views
rpart() decision tree fails to generate splits (decision tree with only one node (the root node))
I'm trying to create a decision tree to predict whether a given loan applicant would default or repay their debt.
I'm using the following dataset
...
2
votes
1
answer
223
views
On the History of Gradient Boosting
I have recently done some work altering popular gradient boosted decision trees (GBDTs) for regression, and I was just working on establishing a theoretic basis for the modern algorithm. There is a ...
2
votes
2
answers
167
views
Is feature importance given by decision tree universal?
I'm wondering that if I have a set of features on a fitted classification decision tree with relative low feature importance, would it mean that these features would also be negligible when fitted ...
1
vote
0
answers
74
views
XGB predict_proba estimates don't match sum of leaves [closed]
When using an XGB model in the context of binary classification, I observed that the test estimates given by predict_proba were close but not equal to the results I ...
0
votes
1
answer
66
views
3
votes
1
answer
329
views
Can you explain this description of tree pruning in Intro to Statistical Learning?
The underlined sentences below from p. 331 in An Introduction to Statistical Learning have me scratching my head: Given that the splitting algorithm always finds the best next split in terms of error ...
1
vote
1
answer
594
views
How does Cross Validation work in decision trees (or tree ensembles)
I've been working with tree-based models for a long time and I never really asked myself how cross-validation would work when building a tree.
For the sake of this question, suppose I've split my ...
1
vote
0
answers
192
views
Why does feature importance decrease for highly correlated variables?
I am investigating the relationship between correlation between features and its impact on their feature importances using sklearn's DecisionTreeClassifier algorithm.
I manipulated the correlation of ...
2
votes
2
answers
170
views
Why does my neural network consider different features important compared to my decision tree?
I built a neural network and a decision tree using very similar data sets (the only difference was the randomness of selecting the training vs testing set). The variables with the highest shapley ...
0
votes
2
answers
167
views
Classification and regression trees splitting depth - how it works?
I am trying to understand how a CART tree grows, So I am growing a tree step by step, and I am finding a strange (?) behavior. Let me show this by means of an example: I will use the titanic data set ...
2
votes
1
answer
609
views
Gini impurity greedily optimises a loss function in decision trees
I am trying to understand how the Gini criterion for decision decision tree construction actually greedily optimises a loss function.
The Gini impurity, sometimes also called Gini index, for a region (...
1
vote
0
answers
734
views
Monotone constraints in decision tree regressor or random forest regression
after I've spent several weeks trying to fit a regression model to my flood damage data (x1=water height, x2=adaptation height, x3=(x1-x2), y=damage), it is now time for my very first question on ...
1
vote
0
answers
170
views
What is the splitting criterion in Regression trees (DecisionTreeRegressor sklearn) in the multi output case
I am using DecisionTreeRegressor and RandomForestRegressor from sklearn in a case where i have multiple output, but i did not find a reference article for the regression case (which is used by sklearn)...
0
votes
2
answers
2k
views
Difference between max_depth and max_leaf_nodes in decision tree classifier (sklearn)
What is the difference between max_depth and max_leaf_nodes parameter in decision tree classifier.
if depth is 4, then the number of leaf nodes will be 2^4 = 16.
So providing max_depth = 4 or ...
3
votes
1
answer
268
views
Why cannot a single decision tree represent an entire Random Forest?
I was intrigued by the reply from @JohnRos to the post Making a single decision tree from a random forest.
They say "<...> a random forest prediction cannot be represented by a single tree....
0
votes
0
answers
61
views
2 or more continious features in Tree classification
If a training set has a continious feature, some texts recommend that first the dataset is sorted based on the continious feature, and then split points are chosen. What I am not sure about, is how ...
4
votes
0
answers
53
views
Is there a decision tree impurity metric based on maximum of probabilities?
I'm trying to understand impurity metrics in decision tree learning, in particular the Gini impurity. Questioning one of the assumptions of Gini impurity has led me to another impurity measure which ...
1
vote
1
answer
244
views
Goodness of fit test/index for a regression tree
I have fitted a regression tree on my data and would like to demonstrate that it is a good model. Are there any standard goodness of fit test or index for a regression tree?
I understand that I can ...
2
votes
1
answer
8k
views
How to use categorical features in lightGBM? [closed]
I am working on an attrition dataset which has a large number of categorical parameters. Each categorical parameter has a high cardinality, so one-hot encoding them is out of question. I was looking ...
2
votes
0
answers
61
views
Can you use regression trees for classification tasks in random forest?
I've been playing around with random forest algorithm to classify a binary Y vector using classification and regression trees. Classification trees output class probability and regression trees an ...
2
votes
1
answer
104
views
How to find AUC from Binary Classification Decision Tree?
Decision Tree
I have found Misclassification rates for all the leaf nodes.
samples = 3635 + 1101 = 4736, class = Cash, misclassification rate = 1101 / 4736 = 0.232.
samples = 47436 + 44556 = 91992, ...