Decision Tree in classification problems in ML

What is a Decision Tree?
• A decision tree is a supervised learning algorithm that can be used for
both classification and regression tasks.
• It works by recursively partitioning the data into subsets based on
feature values, making decisions at each node to maximize a specific
criterion (e.g., information gain or Gini index).
• Key Components:
• Root Node: The top node in the tree that represents the best
feature to split the data.
• Internal Nodes: Represent the features used for splitting the data
based on specific decision rules.
• Leaf Nodes: Terminal nodes that represent the predicted outcome
(class label or numerical value).
• Branches: Connections between nodes representing the possible
values of the features.

Create Decision Tree
• The process of creating a decision tree involves:
1. Selecting the Best Attribute: Using a metric like Gini impurity,
entropy, or information gain, the best attribute to split the data is
selected.
2. Splitting the Dataset: The dataset is split into subsets based on the
selected attribute.
3. Repeating the Process: The process is repeated recursively for each
subset, creating a new internal node or leaf node until a stopping
criterion is met (e.g., all instances in a node belong to the same
class or a predefined depth is reached).

Dataset where play tennis is the target output and
rest all are input features

Dataset Information
• Total observations: 14
• Number of observations for Yes class: 9
• Number of observations for No class: 5
• Dataset columns:
• Outlook: Sunny, Overcast, Rain
• Temperature: Hot, Mild, Cool
• Humidity: High, Normal
• Wind: Weak, Strong
• Play Tennis: Yes, No (Target)

Decision tree example where the root node
chosen is outlook

Leaf Node
• Leaf nodes represent the final output or prediction of the decision tree.
• Once a data point reaches a leaf node, a decision or prediction is made
based on the majority class (for classification) or the average value (for
regression) of the data points that reach that leaf.
• To check mathematically if any split is pure split or not we use entropy
or gini impurity.
• Information Gain helps us to determine which features need to be
selected.
• Information Gain helps us to determine which features need to be
selected.

Entropy
• Entropy is a measure of uncertainty or impurity. A low entropy
indicates a more ordered or homogeneous set, while a high entropy
signifies greater uncertainty or diversity.
• In the context of a decision tree, the goal is to reduce entropy by
selecting features and split points that result in more ordered subsets.
• Entropy values range from 0 to 1.
• The minimum entropy (0) occurs when all instances belong to a single
class, making the set perfectly ordered.
• The maximum entropy (1) occurs when observations are evenly
distributed across all classes, creating a state of maximum
randomness.

Gini impurity
• Gini impurity is a measure of the impurity or randoness in a set of
elements, commonly used in decision tree algorithms, especially for
classification tasks.
• A lower Gini impurity suggests a more homogeneous set of elements
within the node, making it an attractive split in a decision tree.
• Decision tree algorithms aim to minimize the Gini impurity at each
node, selecting the feature and split point that results in the lowest
impurity.
• Entropy and Gini impurity formulas:

Entropy and Gini index values
➢Entropy:
• The minimum value of entropy is 0.
• Thus, maximum entropy values vary for different numbers of classes.
• For 2 classes (binary classification): maximum entropy is 1.
• For 3 classes: maximum entropy is 𝑙𝑜𝑔23≈1.585
• For 4 classes: maximum entropy is 𝑙𝑜𝑔24 =2 and so on.
➢Gini index:
• G = 0 indicates a perfectly pure node (all elements belong to the same
class).
• G = 0.5 indicates maximum impurity (elements are evenly distributed
among all classes)

Example
If our dataset is huge, we should choose Gini impurity as its calculation
is much simpler compared to entropy.

Feature selection for splitting
Since the information gain of f2 is greater than that of f1, the split will
be done with respect to that of f2.

Mini version of Play Tennis dataset

Information Gain for the attribute “Outlook”
• Step 1: Calculate the Entropy of the Dataset
• Step 2: Calculate Entropy After the Split
• Sunny: 2 “No”, 1 “Yes”

• Overcast: All “Yes” (no uncertainty here)
• Rain: 2 “Yes”, 1 “No”
• Step 3: Calculate Information Gain

Interpretation
• The Information Gain for the attribute “Outlook” is 0.45. This means
that splitting the data based on “Outlook” reduces the uncertainty in
the dataset by 45%.
• It’s the most informative attribute in this case, and that’s why a
decision tree would likely choose it as the first split.

Steps to Construct Decision Tree using Gini Index
1. Calculate Gini Index for the dataset.
2. Split Data by Attributes and compute the Gini Index for each split.
3. Choose the Best Attribute for splitting the dataset by selecting the
one with the lowest Gini Index.
4. Recursively Repeat steps 1–3 for each subset until all data points
are classified or other stopping criteria are met (e.g., the tree
depth).

Decision Tree on Playing Tennis using Gini Impurity

1. Calculate Gini index for the entire dataset
Where 𝑝 𝑥𝑖 is the proportion of instances belonging to class 𝑖.
G(X) = 1−
9
14
2
+
5
14
1
= 0.45 ➔ 𝐺 𝑋 =
9
14
1 −
9
14
+
5
14
1 −
5
14
= 0.45
• Numerically, it turns out that gini impurity and entropy are very similar.
Gini is preferred as we don’t have to find log, so we can speed up our
implementation.
• Evaluate Splits for each feature: Outlook, Temperature, Humidity, and
Wind.

𝐼𝐺 𝑂𝑢𝑡𝑙𝑜𝑜𝑘 = 0.45 − 0.34 = 0.11
𝐼𝐺 𝑇𝑒𝑚𝑝𝑎𝑟𝑎𝑡𝑢𝑟𝑒 = 0.04
𝐼𝐺 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 = 0.09
𝐼𝐺 𝑊𝑖𝑛𝑑 = 0.03
IG is maximized for Outlook so we will select
Outlook as the node at this level.

IG is maximum for Outlook, Outlook will be Root

Information Gain on Sunny outlook factor

Now focusing on the left branch Outlook = Sunny
𝐺 𝑋 𝑂 = 𝑆𝑢𝑛𝑛𝑦 =
2
5
1 −
2
5
+
3
5
1 −
3
5
= 0.48
𝐺 𝑋 𝑂 = 𝑆𝑢𝑛𝑛𝑦 = 1 − 𝑝1
2 + 𝑝2
2 = 1 −
2
5
2
+
3
5
2
= 0.48
• Choosing Humidity (H) as our next node, the Gini impurity of PlayTennis
given Humidity and Outlook being Sunny is:
• Subset 1: High (3 Samples: 0 Yes, 3 No)
• Subset 2: Normal (2 Samples: 2 Yes, 0 No)
• 𝐺 𝑋 𝐻, 𝑂 = 𝑆𝑢𝑛𝑛𝑦 =
3
5
3
3
1 −
3
3
+
2
5
2
2
1 −
2
2
= 0
• 𝐼𝐺 𝑋 𝐻, 𝑂 = 𝑆𝑢𝑛𝑛𝑦 = 0.48 − 0 = 0.48

• Similarly we can compute,
• 𝐼𝐺 𝑋 𝑂 = 𝑆𝑢𝑛𝑛𝑦, 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 = 0.48
• 𝐼𝐺(𝑋|𝑂 = 𝑆𝑢𝑛𝑛𝑦, 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒) = 0.28
• 𝐼𝐺(𝑋|𝑂 = 𝑆𝑢𝑛𝑛𝑦, 𝑊𝑖𝑛𝑑) = 0.013
• IG is maximized for Humidity so we will select Humidity as the node at
this level.
• Our tree will look like:

Information Gain on Rain outlook factor
Day Wind Decision
4 Weak Yes
5 Weak Yes
10 Weak Yes
Day Wind Decision
6 Strong No
14 Strong No

Now focusing on the right branch Outlook = Rain
• 𝐺 𝑋 𝑂 = 𝑅𝑎𝑖𝑛 =
3
5
1 −
3
5
+
2
5
1 −
2
5
= 0.48
• 𝐺 𝑋 𝑂 = 𝑅𝑎𝑖𝑛 = 1 − 𝑝1
2 + 𝑝2
2 = 1 −
3
5
2
+
2
5
2
= 0.48
• Choosing Humidity (H) as our next node, the Gini impurity of Play
Tennis given Humidity and Outlook being Rain is:
• Subset 1: Weak (3 Samples: 3 Yes, 0 No)
• Subset 2: Strong (2 Samples: 0 Yes, 2 No)
• 𝐺 𝑋 𝑊, 𝑂 = 𝑅𝑎𝑖𝑛 =
3
5
3
3
1 −
3
3
+
2
5
2
2
1 −
2
2
= 0
• 𝐼𝐺 𝑋 𝑊, 𝑂 = 𝑅𝑎𝑖𝑛 = 0.48 − 0 = 0.48

Choosing Humidity (H) and Temperature (T)with
Outlook being Rain is:
• Similarly we can compute,
• 𝐼𝐺 𝑋 𝑂 = 𝑅𝑎𝑖𝑛, 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 = 0.48 − 0.46 = 0.02
• 𝐼𝐺 𝑋 𝑂 = 𝑅𝑎𝑖𝑛, 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 = 0.48 − 0.46 = 0.02
• 𝐼𝐺 𝑋 𝑂 = 𝑅𝑎𝑖𝑛, 𝑊𝑖𝑛𝑑 = 0.48 − 0 = 0.48
• IG is maximized for Wind so we will select Wind as the node at this
level.

Test Instance
• For test instance, we just need to start from the root node and traverse
to a leaf node.
• For example, if we had a sample like: Outlook = Sunny; Humidity =
Normal; Wind = Strong; Temperature = Mild.
• We will first start from Outlook and as it is Sunny so we go to the left
branch, then we check Humidity and as that is Normal, we go to the
right branch and reach a leaf node and our output is Yes!
• Final decision, we will play tennis.

Overfitting of Decision Tree
• Decision trees can easily overfit, if we have too few examples near the
bottom or our tree is too deep then we could end up overfitting to
our training data.
• In such situations we can use stopping criteria's like maximum depth,
etc.
• We can also use pruning where we remove a subtree and use
majority voting at that node instead.
• Intuitively decision trees are relatively easier to understand as
compared to other classification algorithms.
• That is why they are widely used in the industry.

• import pandas as pd
• from sklearn.tree import DecisionTreeClassifier
• from sklearn.model_selection import train_test_split
• from sklearn.metrics import accuracy_score
• # Load the dataset
• data = {
• "Outlook": ["Sunny", "Sunny", "Overcast", "Rain", "Rain", "Rain", "Overcast", "Sunny", "Sunny",
"Rain", "Sunny", "Overcast", "Overcast", "Rain"],
• "Temperature": ["Hot", "Hot", "Hot", "Mild", "Cool", "Cool", "Cool", "Mild", "Cool", "Mild",
"Mild", "Mild", "Hot", "Mild"],
• "Humidity": ["High", "High", "High", "High", "Normal", "Normal", "Normal", "High", "Normal",
"Normal", "Normal", "High", "Normal", "High"],
• "Wind": ["Weak", "Strong", "Weak", "Weak", "Weak", "Strong", "Strong", "Weak", "Weak",
"Weak", "Strong", "Strong", "Weak", "Strong"],
• "Play Tennis": ["No", "No", "Yes", "Yes", "Yes", "No", "Yes", "No", "Yes", "Yes", "Yes", "Yes", "Yes",
"No"]
• }
• df = pd.DataFrame(data)

• # Encode categorical variables
• df_encoded = pd.get_dummies(df, columns=["Outlook", "Temperature", "Humidity", "Wind"], drop_first=True)
• X = df_encoded.drop("Play Tennis", axis=1)
• y = df["Play Tennis"]
• # Split the data into training and testing sets
• X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
• # Train Decision Tree Classifier
• clf = DecisionTreeClassifier(criterion="gini", random_state=42)
• clf.fit(X_train, y_train)
• # Make predictions on the test set
• y_pred = clf.predict(X_test)
• # Compute accuracy
• accuracy = accuracy_score(y_test, y_pred)
• print("Accuracy of the Decision Tree: {:.2f}%".format(accuracy * 100))

Decision Tree in classification problems in ML

More Related Content

Similar to Decision Tree in classification problems in ML

Recently uploaded

Decision Tree in classification problems in ML