Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorithms | Edureka
Pruning is the process of removing branches or nodes from a decision tree to simplify it and reduce overfitting. Some key points about pruning:
- Pruning reduces the complexity of the decision tree to avoid overfitting to the training data.
- It is done to improve the accuracy of the model on new unseen data by removing noisy or unstable parts of the tree.
- Common pruning techniques include pre-pruning, cost-complexity pruning, reduced error pruning etc.
- The goal of pruning is to find a tree with optimal complexity that balances bias and variance for best generalization on new data.
To answer your question - tree based models and linear models each have their own advantages in different situations:
“Classification is theprocess of dividing the datasets
into different categories or groups by adding label”
What is
Classification?
▪ Note: It adds the data point to a particular
labelled group on the basis of some condition”
5.
Types of
Classification
Decision Tree
RandomForest
Naïve Bayes
KNN
Decision Tree
▪ Graphical representation of all the possible solutions to a decision
▪ Decisions are based on some conditions
▪ Decision made can be easily explained
6.
Types of
Classification
Decision Tree
RandomForest
Naïve Bayes
KNN
Random Forest
▪ Builds multiple decision trees and merges them together
▪ More accurate and stable prediction
▪ Random decision forests correct for decision trees' habit
of overfitting to their training set
▪ Trained with the “bagging” method
7.
Types of
Classification
Decision Tree
RandomForest
Naïve Bayes
KNN
Naïve Bayes
▪ Classification technique based on Bayes' Theorem
▪ Assumes that the presence of a particular feature in a class is
unrelated to the presence of any other feature
8.
Types of
Classification
Decision Tree
RandomForest
Naïve Bayes
KNN
K-Nearest Neighbors
▪ Stores all the available cases and classifies new cases
based on a similarity measure
▪ The “K” is KNN algorithm is the nearest neighbors we wish
to take vote from.
What is
Entropy?
▪ Definesrandomness in the data
▪ Entropy is just a metric which measures the impurity or
▪ The first step to solve the problem of a decision tree
28.
What is
Entropy?
If numberof yes = number of no ie P(S) = 0.5
Entropy(s) = 1
If it contains all yes or all no ie P(S) = 1 or 0
Entropy(s) = 0
- P(yes) log2 P(yes) − P(no) log2 P(no)Entropy(s) =
Where,
▪ S is the total sample space,
▪ P(yes) is probability of yes
29.
What is
Entropy?
E(S) =-P(Yes) log2 𝑃(𝑌𝑒𝑠)
When P(Yes) =P(No) = 0.5 ie YES + NO = Total Sample(S)
E(S) = 0.5 log2 0.5 − 0.5 log2 0.5
E(S) = 0.5( log2 0.5 - log2 0.5)
E(S) = 1
30.
What is
Entropy?
E(S) =-P(Yes) log2 𝑃(𝑌𝑒𝑠)
When P(Yes) = 1 ie YES = Total Sample(S)
E(S) = 1 log2 1
E(S) = 0
E(S) = -P(No) log2 𝑃(𝑁𝑜)
When P(No) = 1 ie No = Total Sample(S)
E(S) = 1 log2 1
E(S) = 0
31.
What is
Information
Gain?
▪ Measuresthe reduction in entropy
▪ Decides which attribute should be selected as the
decision node
If S is our total collection,
Information Gain = Entropy(S) – [(Weighted Avg) x Entropy(each feature)]