Decision tree

Content
 What is decision tree?
 Example
 How to select the deciding node?
 Entropy
 Information gain
 Gini Impurity
 Steps for Making decision tree
 Pros.
 Cons.

What is decision tree?
 Decision tree is the most powerful and
popular tool for classification and prediction.
A Decision tree is a flowchart like tree
structure, where each internal node
denotes a test on an attribute, each branch
represents an outcome of the test, and
each leaf node (terminal node) holds a
class label.
 It is a type of superised learning algorithm.
 It is one of the most widely used and
practical method for inductive inference

Example
 Let's assume we want to play
badminton on a particular day — say
Saturday — how will you decide
whether to play or not. Let's say you
go out and check if it's hot or cold,
check the speed of the wind and
humidity, how the weather is, i.e. is it
sunny, cloudy, or rainy. You take all
these factors into account to decide if
you want to play or not.

So, you calculate all these factors for the last ten days
and form a lookup table like the one below.
Day Weather Temperature Humidity Wind
1 Sunny Hot High Weak
2 Cloudy Hot High Weak
3 Sunny Mild Normal Strong
4 Cloudy Mild High Strong
5 Rainy Mild High Strong
6 Rainy Cool Normal Strong
7 Rainy Mild High Weak
8 Sunny Hot High Strong
9 Cloudy Hot Normal Weak
10 Rainy Mild High Strong

 A decision tree would be a great way to represent
data like this because it takes into account all the
possible paths that can lead to the final decision by
following a tree-like structure.

How to select the deciding
node?
Which is the best Classifier?

So,we can conclude
 Less impure node requires less
information to describe it.
 More impure node requires more
information.
 Information theory is a measure to
define this degree of disorganization in
a system known as Entropy.

Entropy - measuring
homogeneity of a learning set
 Entropy is a measure of the uncertainty about a
source of messages.
 Given a collection S, containing positive and
negative examples of some target concept, the
entropy of S relative to this classification.
where, pi is the proportion of S belonging to class i

Entropy is 0 if all
the members of S
belong to the same
class.
Entropy is 1 when
the collection
contains an equal
no. of +ve and -ve
examples.
Entropy is between
0 and 1 if the
collection contains
unequal no. of +ve
and -ve examples.

Information gain
 Decides which attribute goes into a
decision node.
 To minimize the decision tree depth,
the attribute with the most entropy
reduction is the best choice!
 They are of 2 types-
1. High Information Gain
2. Low Information Gain

 The information gain, Gain(S,A) of an attribute .
Where:
S is each value v of all possible values of attributeA Sv = subset of
S for which attribute A has valuev
|Sv| = number of elements in Sv
|S| = number of elements in S

High Information Gain
 An attribute with high information gain splits the
data into groups with an uneven number of
positives and negatives and as a result helps in
separating the two from each other.

Low Information Gain
 An attribute with low information gain
splits the data relatively evenly and as
a result doesn’t bring us any closer to
a decision.

Gini Impurity
 Gini Impurity is a measurement of the
likelihood of an incorrect classification of a new
instance of a random variable, if that new
instance were randomly classified according to
the distribution of class labels from the data
set.
 If our dataset is Pure then likelihood of
incorrect classification is 0. If our sample is
mixture of different classes then likelihood of
incorrect classification will be high.
 They are of 2 types
 Pure means, in a selected sample of
dataset all data belongs to same class.
 Impure means, data is mixture of different
classes.

Steps for Making decision tree
 Get list of rows (dataset) which are taken into
consideration for making decision tree (recursively
at each nodes).
 Calculate uncertanity of our dataset or Gini
impurity or how much our data is mixed up etc.
 Generate list of all question which needs to be
asked at that node.
 Partition rows into True rows and False rows based
on each question asked.
 Calculate information gain based on gini impurity
and partition of data from previous step.
 Update highest information gain based on each
question asked.
 Update best question based on information gain
(higher information gain).
 Divide the node on best question. Repeat again
from step 1 again until we get pure node (leaf

Pros.
 Easy to use and understand.
 Can handle both categorical and
numerical data.
 Resistant to outliers, hence require
little data preprocessing.

Cons.
 Prone to overfitting.
 Require some kind of measurement
as to how well they are doing.
 Need to be careful with parameter
tuning.
 Can create biased learned trees if
some classes dominate.

Decision tree

In this document

More Related Content

What's hot

Similar to Decision tree

Recently uploaded

Decision tree