Skip to main content

Questions tagged [cart]

'Classification And Regression Trees', also sometimes called 'decision trees'. CART is a popular machine learning technique, and it forms the basis for techniques like random forests and common implementations of gradient boosting machines.

Filter by
Sorted by
Tagged with
0 votes
2 answers
52 views

I currently have a RandomForestClassifier that is classifying workload based on fNIRS data. Our classification accuracy is about 49% I want to investigate why our classification accuracy is so bad and ...
Maddie Brower's user avatar
0 votes
0 answers
54 views

Given these two target representations for the same underlying data: Target A : Minority class samples (Cluster 5) isolated in distribution tail, majority class samples (Clusters 3+6) shifted toward ...
n0rdp0l's user avatar
0 votes
0 answers
59 views

This article mentions "feature weights" several times: https://xgboosting.com/xgboost-regularization-techniques/ However, it's not clear to me how a tree can have feature weights? It's not ...
Baron Yugovich's user avatar
0 votes
0 answers
83 views

In survival trees, the log-rank test statistic is most often used for split selection. This involves selecting the binary split that produces the largest value of the log-rank test statistic. This ...
user3298179's user avatar
1 vote
1 answer
427 views

I'm trying to create a decision tree to predict whether a given loan applicant would default or repay their debt. I'm using the following dataset ...
OzkanGelincik's user avatar
2 votes
1 answer
223 views

I have recently done some work altering popular gradient boosted decision trees (GBDTs) for regression, and I was just working on establishing a theoretic basis for the modern algorithm. There is a ...
jeffery_the_wind's user avatar
2 votes
2 answers
167 views

I'm wondering that if I have a set of features on a fitted classification decision tree with relative low feature importance, would it mean that these features would also be negligible when fitted ...
bachts's user avatar
  • 21
1 vote
0 answers
74 views

When using an XGB model in the context of binary classification, I observed that the test estimates given by predict_proba were close but not equal to the results I ...
Juan Felipe Salamanca Lozano's user avatar
0 votes
1 answer
66 views

...
Apai's user avatar
  • 19
3 votes
1 answer
329 views

The underlined sentences below from p. 331 in An Introduction to Statistical Learning have me scratching my head: Given that the splitting algorithm always finds the best next split in terms of error ...
Jon's user avatar
  • 191
1 vote
1 answer
594 views

I've been working with tree-based models for a long time and I never really asked myself how cross-validation would work when building a tree. For the sake of this question, suppose I've split my ...
Arturo Sbr's user avatar
1 vote
0 answers
192 views

I am investigating the relationship between correlation between features and its impact on their feature importances using sklearn's DecisionTreeClassifier algorithm. I manipulated the correlation of ...
AvanishM's user avatar
2 votes
2 answers
170 views

I built a neural network and a decision tree using very similar data sets (the only difference was the randomness of selecting the training vs testing set). The variables with the highest shapley ...
Jay's user avatar
  • 41
0 votes
2 answers
167 views

I am trying to understand how a CART tree grows, So I am growing a tree step by step, and I am finding a strange (?) behavior. Let me show this by means of an example: I will use the titanic data set ...
Nicolas Molano's user avatar
2 votes
1 answer
609 views

I am trying to understand how the Gini criterion for decision decision tree construction actually greedily optimises a loss function. The Gini impurity, sometimes also called Gini index, for a region (...
ngmir's user avatar
  • 339
1 vote
0 answers
734 views

after I've spent several weeks trying to fit a regression model to my flood damage data (x1=water height, x2=adaptation height, x3=(x1-x2), y=damage), it is now time for my very first question on ...
Sjafnargata's user avatar
1 vote
0 answers
170 views

I am using DecisionTreeRegressor and RandomForestRegressor from sklearn in a case where i have multiple output, but i did not find a reference article for the regression case (which is used by sklearn)...
Rayane Elimam's user avatar
0 votes
2 answers
2k views

What is the difference between max_depth and max_leaf_nodes parameter in decision tree classifier. if depth is 4, then the number of leaf nodes will be 2^4 = 16. So providing max_depth = 4 or ...
amtn's user avatar
  • 1
3 votes
1 answer
268 views

I was intrigued by the reply from @JohnRos to the post Making a single decision tree from a random forest. They say "<...> a random forest prediction cannot be represented by a single tree....
Smerdjakov's user avatar
0 votes
0 answers
61 views

If a training set has a continious feature, some texts recommend that first the dataset is sorted based on the continious feature, and then split points are chosen. What I am not sure about, is how ...
Karl 17302's user avatar
4 votes
0 answers
53 views

I'm trying to understand impurity metrics in decision tree learning, in particular the Gini impurity. Questioning one of the assumptions of Gini impurity has led me to another impurity measure which ...
A Kubiesa's user avatar
  • 171
1 vote
1 answer
244 views

I have fitted a regression tree on my data and would like to demonstrate that it is a good model. Are there any standard goodness of fit test or index for a regression tree? I understand that I can ...
Santanu's user avatar
  • 121
2 votes
1 answer
8k views

I am working on an attrition dataset which has a large number of categorical parameters. Each categorical parameter has a high cardinality, so one-hot encoding them is out of question. I was looking ...
Ashish Samant's user avatar
2 votes
0 answers
61 views

I've been playing around with random forest algorithm to classify a binary Y vector using classification and regression trees. Classification trees output class probability and regression trees an ...
Mirko Pavicic's user avatar
2 votes
1 answer
104 views

Decision Tree I have found Misclassification rates for all the leaf nodes. samples = 3635 + 1101 = 4736, class = Cash, misclassification rate = 1101 / 4736 = 0.232. samples = 47436 + 44556 = 91992, ...
Aman Rangapur's user avatar

1
2 3 4 5
26