Classifier Accuracy
Measures
Evaluating Classification Algorithms
Utkarsh Sharma
Asst. Prof. CSE Dept.
JUET(M.P.) India
18-04-2020 Utkarsh Sharma 1
Contents
• Why Do we need Evaluation
• Metrics for evaluation
• Confusion Matrix
18-04-2020 Utkarsh Sharma 2
Why Evaluation..???
• How to evaluate the performance of a model?
• How to obtain reliable estimates?
• How to compare the relative performance among competing models?
• Given two equally performing models,
which one should we prefer?
18-04-2020 Utkarsh Sharma 3
Need
• Evaluating the quality of our machine learning model is
extremely important for continuing to improve our model until it
performs as best as it can.
• For classification problems, evaluation metrics compare the
expected class label to the predicted class label or interpret the
predicted probabilities for the class labels.
• For example, suppose you used data from previous sales to train a
classifier to predict customer purchasing behavior. You would like an
estimate of how accurately the classifier can predict the purchasing
behavior of future customers, that is, future customer data on which
the classifier has not been trained.
18-04-2020 Utkarsh Sharma 4
Metrics for Performance Evaluation
1. Confusion Matrix
2. Precision
3. Recall
4. Accuracy
5. Specificity
6. F1-Score
18-04-2020 Utkarsh Sharma 5
Confusion Matrix
• The confusion matrix is a useful tool for analyzing how well your classifier
can recognize tuples of different classes.
• A confusion matrix is a table with four different combinations of predicted
and actual values.
• Given m classes, a confusion matrix is a table of at least size m by m. An
entry, CMi, j in the first m rows and m columns indicates the number of
tuples of class i that were labeled by the classifier as class j.
• For a classifier to have good accuracy, ideally most of the tuples would be
represented along the diagonal of the confusion matrix, from entry CM1, 1
to entry CMm, m, with the rest of the entries being close to zero.
18-04-2020 Utkarsh Sharma 6
Confusion Matrix Example(Binary Classification)
True Positives (TP) : The number of times our
model predicted YES and the actual output
was also YES.
True Negatives (TN): The number of times
our model predicted NO and the actual
output was NO.
False Positives (FP): The number of times our
model predicted YES and the actual output
was NO. This is known as a Type 1 Error.
False Negatives (FN): The number of times
our model predicted NO and the actual
output was YES. This is known as a Type 2
Error.
18-04-2020 Utkarsh Sharma 7
Example to Read confusion matrix:
TP = 100
TN = 50
FP = 10
FN = 5
18-04-2020 Utkarsh Sharma 8
Accuracy
• Accuracy is determining out of all the classifications, how many did
we classify correctly? This can be represented mathematically as:
• Using our confusion matrix terms, this equation is written as:
• We want the accuracy score to be as high as possible. It is important to note that accuracy may
not always be the best metric to use, especially in cases of a class-imbalanced data set. This is
when the distribution of data is not equal across all classes.
18-04-2020 Utkarsh Sharma 9
Sometimes Accuracy is not Enough
• Consider a 2-class problem
• Number of Class 0 examples = 9990
• Number of Class 1 examples = 10
• If model predicts everything to be class 0,
• accuracy is 9990/10000 = 99.9 %
• Accuracy is misleading because model does not detect any class 1
example
18-04-2020 Utkarsh Sharma 10
Precision
• Precision can be described as the fraction of relevant instances
among the retrieved instances. This answers the question “ What
proportion of positive identifications was actually correct?” The
formula is as follows:
• In the terms of our confusion matrix, the equation can be
represented as:
• Precision expresses the proportion of the data points our model says was relevant actually were
relevant.
18-04-2020 Utkarsh Sharma 11
Recall
• Recall, also known as sensitivity. This answers the question “What
proportion of actual positives was classified correctly?” This can be
represented by the following equation:
• In our confusion matrix, it would be represented by:
• Recall expresses which instances are relevant in a data set. It is important to examine both the
Precision AND Recall when evaluating a model because they often have an inverse relationship.
When precision increases, recall tends to decrease and vice versa.
18-04-2020 Utkarsh Sharma 12
Specificity
• Specificity (SP) is calculated as the number of correct negative
predictions divided by the total number of negatives. It is also called
true negative rate (TNR). The best specificity is 1.0, whereas the worst
is 0.0.
• Specificity is the exact opposite of Recall.
18-04-2020 Utkarsh Sharma 13
F1-Score
• The F1 Score is a function of precision and recall. It is used to find the
correct balance between the two metrics. It determines how many
instances your model classifies correctly without missing a significant
number of instances. This score can be represented by the following
equation:
• Having an imbalance between precision and recall, such as a high
precision and low recall, can give you an extremely accurate model,
but classifies difficult data incorrectly. We want the F1 Score to be as
high as possible for the best performance of our model.
18-04-2020 Utkarsh Sharma 14
References
• https://www.geeksforgeeks.org/confusion-matrix-machine-learning/
• https://medium.com/thalus-ai/performance-metrics-for-classification-problems-in-machine-
learning-part-i-b085d432082b
• https://towardsdatascience.com/evaluation-metrics-for-classification-problems-in-machine-
learning-d9f9c7313190
• Han, Jiawei, and Micheline Kamber. Data Mining: Concepts and Techniques. San Francisco:
Morgan Kaufmann Publishers, 2001.
• https://www.slideshare.net/pierluca.lanzi/dmtm-lecture-06-classification-evaluation
18-04-2020 Utkarsh Sharma 15

Evaluating classification algorithms

  • 1.
    Classifier Accuracy Measures Evaluating ClassificationAlgorithms Utkarsh Sharma Asst. Prof. CSE Dept. JUET(M.P.) India 18-04-2020 Utkarsh Sharma 1
  • 2.
    Contents • Why Dowe need Evaluation • Metrics for evaluation • Confusion Matrix 18-04-2020 Utkarsh Sharma 2
  • 3.
    Why Evaluation..??? • Howto evaluate the performance of a model? • How to obtain reliable estimates? • How to compare the relative performance among competing models? • Given two equally performing models, which one should we prefer? 18-04-2020 Utkarsh Sharma 3
  • 4.
    Need • Evaluating thequality of our machine learning model is extremely important for continuing to improve our model until it performs as best as it can. • For classification problems, evaluation metrics compare the expected class label to the predicted class label or interpret the predicted probabilities for the class labels. • For example, suppose you used data from previous sales to train a classifier to predict customer purchasing behavior. You would like an estimate of how accurately the classifier can predict the purchasing behavior of future customers, that is, future customer data on which the classifier has not been trained. 18-04-2020 Utkarsh Sharma 4
  • 5.
    Metrics for PerformanceEvaluation 1. Confusion Matrix 2. Precision 3. Recall 4. Accuracy 5. Specificity 6. F1-Score 18-04-2020 Utkarsh Sharma 5
  • 6.
    Confusion Matrix • Theconfusion matrix is a useful tool for analyzing how well your classifier can recognize tuples of different classes. • A confusion matrix is a table with four different combinations of predicted and actual values. • Given m classes, a confusion matrix is a table of at least size m by m. An entry, CMi, j in the first m rows and m columns indicates the number of tuples of class i that were labeled by the classifier as class j. • For a classifier to have good accuracy, ideally most of the tuples would be represented along the diagonal of the confusion matrix, from entry CM1, 1 to entry CMm, m, with the rest of the entries being close to zero. 18-04-2020 Utkarsh Sharma 6
  • 7.
    Confusion Matrix Example(BinaryClassification) True Positives (TP) : The number of times our model predicted YES and the actual output was also YES. True Negatives (TN): The number of times our model predicted NO and the actual output was NO. False Positives (FP): The number of times our model predicted YES and the actual output was NO. This is known as a Type 1 Error. False Negatives (FN): The number of times our model predicted NO and the actual output was YES. This is known as a Type 2 Error. 18-04-2020 Utkarsh Sharma 7
  • 8.
    Example to Readconfusion matrix: TP = 100 TN = 50 FP = 10 FN = 5 18-04-2020 Utkarsh Sharma 8
  • 9.
    Accuracy • Accuracy isdetermining out of all the classifications, how many did we classify correctly? This can be represented mathematically as: • Using our confusion matrix terms, this equation is written as: • We want the accuracy score to be as high as possible. It is important to note that accuracy may not always be the best metric to use, especially in cases of a class-imbalanced data set. This is when the distribution of data is not equal across all classes. 18-04-2020 Utkarsh Sharma 9
  • 10.
    Sometimes Accuracy isnot Enough • Consider a 2-class problem • Number of Class 0 examples = 9990 • Number of Class 1 examples = 10 • If model predicts everything to be class 0, • accuracy is 9990/10000 = 99.9 % • Accuracy is misleading because model does not detect any class 1 example 18-04-2020 Utkarsh Sharma 10
  • 11.
    Precision • Precision canbe described as the fraction of relevant instances among the retrieved instances. This answers the question “ What proportion of positive identifications was actually correct?” The formula is as follows: • In the terms of our confusion matrix, the equation can be represented as: • Precision expresses the proportion of the data points our model says was relevant actually were relevant. 18-04-2020 Utkarsh Sharma 11
  • 12.
    Recall • Recall, alsoknown as sensitivity. This answers the question “What proportion of actual positives was classified correctly?” This can be represented by the following equation: • In our confusion matrix, it would be represented by: • Recall expresses which instances are relevant in a data set. It is important to examine both the Precision AND Recall when evaluating a model because they often have an inverse relationship. When precision increases, recall tends to decrease and vice versa. 18-04-2020 Utkarsh Sharma 12
  • 13.
    Specificity • Specificity (SP)is calculated as the number of correct negative predictions divided by the total number of negatives. It is also called true negative rate (TNR). The best specificity is 1.0, whereas the worst is 0.0. • Specificity is the exact opposite of Recall. 18-04-2020 Utkarsh Sharma 13
  • 14.
    F1-Score • The F1Score is a function of precision and recall. It is used to find the correct balance between the two metrics. It determines how many instances your model classifies correctly without missing a significant number of instances. This score can be represented by the following equation: • Having an imbalance between precision and recall, such as a high precision and low recall, can give you an extremely accurate model, but classifies difficult data incorrectly. We want the F1 Score to be as high as possible for the best performance of our model. 18-04-2020 Utkarsh Sharma 14
  • 15.
    References • https://www.geeksforgeeks.org/confusion-matrix-machine-learning/ • https://medium.com/thalus-ai/performance-metrics-for-classification-problems-in-machine- learning-part-i-b085d432082b •https://towardsdatascience.com/evaluation-metrics-for-classification-problems-in-machine- learning-d9f9c7313190 • Han, Jiawei, and Micheline Kamber. Data Mining: Concepts and Techniques. San Francisco: Morgan Kaufmann Publishers, 2001. • https://www.slideshare.net/pierluca.lanzi/dmtm-lecture-06-classification-evaluation 18-04-2020 Utkarsh Sharma 15