MACHINE LEARNING (INTEGRATED)
(21ISE62)
Dr. Shivashankar
Professor
Department of Information Science & Engineering
GLOBAL ACADEMY OF TECHNOLOGY-Bengaluru
7/19/2024 1
Dr. Shivashankar, ISE, GAT
GLOBAL ACADEMY OF TECHNOLOGY
Ideal Homes Township, Rajarajeshwari Nagar, Bengaluru – 560 098
Department of Information Science & Engineering
Course Outcomes
After Completion of the course, student will be able to:
 Illustrate Regression Techniques and Decision Tree Learning
Algorithm.
 Apply SVM, ANN and KNN algorithm to solve appropriate problems.
 Apply Bayesian Techniques and derive effective learning rules.
 Illustrate performance of AI and ML algorithms using evaluation
techniques.
 Understand reinforcement learning and its application in real world
problems.
Text Book:
1. Tom M. Mitchell, Machine Learning, McGraw Hill Education, India Edition 2013.
2. EthemAlpaydın, Introduction to machine learning, MIT press, Second edition.
3. Pang-Ning Tan, Michael Steinbach, Vipin Kumar, Introduction to Data Mining,
Pearson, First Impression, 2014.
7/19/2024 2
Dr. Shivashankar, ISE, GAT
MODULE-2
SUPPORT VECTOR MACHINE
• Support Vector Machine called as SVM is one of the most popular Supervised Learning
algorithms, which is used for Classification as well as Regression prediction tool that
uses machine learning theory to maximize predictive accuracy while automatically
avoiding over-fit to the data.
• SVM can be defined as systems which use hypothesis space of a linear functions in a
high dimensional feature space, trained with a learning algorithm.
• The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data
point in the correct category in the future.
• This best decision boundary is called a hyperplane.
• SVM becomes famous when, using pixel maps as input; it gives best accuracy.
• SVM was developed by Vladimir Vapnik in the 1970s.
• SVM algorithm helps to find the best line or decision boundary; this best boundary or
region is called as a hyperplane.
• SVM algorithm finds the closest data points of the lines from both the classes.
• These points are called support vectors.
• The distance between the vectors and the hyperplane is called as margin and the goal
of SVM is to maximize this margin.
• The hyperplane with maximum margin is called the optimal hyperplane.
7/19/2024 3
Dr. Shivashankar, ISE, GAT
Cont…
SVM algorithm can be used for Face detection, image classification, text
categorization, etc.
Types of SVM:
Linear SVM: Used for linearly separable data, if a dataset can be classified into two classes
by using a single straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
Non-linear SVM: Used for non-linearly separated data, if a dataset cannot be classified by
using a straight line, then such data is termed as non-linear data and classifier used is
called as Non-linear SVM classifier.
7/19/2024 4
Dr. Shivashankar, ISE, GAT
Fig. 2.1. Concept of SVM Technique
Examples of Bad Decision Boundaries
Class 1
Class 2
Class 1
Class 2
Fig. 3: Examples of Bad Decision Boundaries
Linearly Separable Case
If a dataset can be classified into two classes by using a single straight line, then such data
is termed as linearly separable data, and classifier is used called as Linear SVM classifier
and classification problem is Binary classification or two class classification.
Binary classification can be viewed as the task of separating classes in feature space:
Hyperplane:
Where
7/19/2024 6
Dr. Shivashankar, ISE, GAT
f(x) = (wTx + b)
– w : weight vector
– x : input vector
– b : bias or offset value
Fig 2.2: Linearly Separable classification
Cont..
Define the hyperplanes H such that
w•xi+b ≥1, when yi =+1
w•xi+b < -1, when yi =–1
H1 and H2 are the margins:
H1: w•xi+b = +1
H2: w•xi+b = –1
The points on the margins H1 and H2 are the tips of the Support Vectors.
The plane H0 is the median in between, where w•xi+b =0
d+ = the shortest distance to the closest positive point.
d- = the shortest distance to the closest negative point.
The margin (gutter) of a separating hyperplane is d+ + d–.
7/19/2024 7
Dr. Shivashankar, ISE, GAT
Maximizing the margin
We want a classifier with as big margin as possible
Recall the distance from a point (x0,y0) to a line:
Ax+By+c = 0 is |A x0 +B y0 +c|/sqrt(A2+B2)
The distance between H1 and H2 is:
|w•x+b|/||w||=1/||w||
The distance between H1 and H2 is: 2/||w||
In order to maximize the margin, we need to minimize ||w||. With the
condition that there are no datapoints between H1 and H2 :
xi•w+b  +1 when yi =+1
xi•w+b  -1 when yi = -1 Can be combined into yi(xi•w)  1
7/19/2024 8
Dr. Shivashankar, ISE, GAT
Constrained optimization problem
• The problem of finding the optimal hyperplane is an optimization problem
and can be solved by optimization techniques.
• It can be solved by the Lagrangian Multipler method (αi), Which can be
formulated as:
𝑤 = ෍
𝑖=1
𝑚
𝛼𝑖𝑥𝑖𝑦𝑖
𝛼𝑖: the Lagrange multiplier, we need a Lagrange multiplier 𝛼i for each of the
constraints
𝑥𝑖 𝑎𝑛𝑑 𝑦𝑖 are called as the support vectors.
7/19/2024 9
Dr. Shivashankar, ISE, GAT
Cont…
Problems:
1. Draw the hyperplane for the given data points (1,1) (2,1) (1,-1) (2,-1) (4,0) (5,1) (5,-1)
(6,0) using SVM and classifying new data points (2,-2).
Solution:
1. Plot the graph:
𝑆𝑒𝑙𝑒𝑐𝑡𝑠 𝑡ℎ𝑒 𝑣𝑒𝑐𝑡𝑜𝑟 𝑠𝑢𝑝𝑝𝑜𝑟𝑡𝑠: 𝑆1 =
2
1
𝑆2 =
2
−1
𝑆3 =
4
0
𝑆1 𝑆2 𝑆3--Support Vector because these are closest data points to the centroid (3-x-axis)
2. To provide vector representation, we need to add bias on all support vectors. Here we
assume bias=1. So, our support vector now become:
ҧ
𝑆1
2
1
1
ҧ
𝑆2
2
−1
1
ҧ
𝑆3
4
0
1
7/19/2024 10
Dr. Shivashankar, ISE, GAT
Cont…
3. Consider one part of the support vector as +ve and other as –ve.
Here, 𝑆1 and 𝑆2 𝑎𝑟𝑒 − 𝑣𝑒 𝑎𝑛𝑑 𝑆3 𝑖𝑠 + 𝑣𝑒.
4. Our objective is to find an optimal hyperplane which means, we need to find
the values of w and b of the optimal hyperplane.
f 𝒙 = w.x +b=0
5. To find the optimal hyperplane, we use Lagrange (α) Multiplier method.
Now let us complete w and b which determine the Optimal hyperplane.
According to Lagrange equation,
𝑤 = ෍
𝑖=1
𝑚
𝛼𝑖𝑥𝑖𝑦𝑖
Here, 𝑥𝑖 𝑎𝑛𝑑 𝑦𝑖 𝑎𝑟𝑒 𝑡ℎ𝑒 support vectors, 𝑆1 , 𝑆2 𝑎𝑛𝑑 𝑆3
Let us substitute support vectors in above equations
∝1
ഥ
𝑆1
ഥ
𝑆1+ ∝2
ഥ
𝑆1 𝑆2+∝3
ഥ
𝑆1 𝑆3 = −1
∝1 𝑆2𝑆1+ ∝2 𝑆2 𝑆2+∝3 𝑆2 𝑆3 = −1
∝1 𝑆3𝑆1+ ∝2 𝑆3 𝑆2+∝3 𝑆3 𝑆3 = 1
7/19/2024 11
Dr. Shivashankar, ISE, GAT
Cont…
Let us substitute values of 𝑆1 , 𝑆2 𝑎𝑛𝑑 𝑆3
∝1
2
1
1
2
1
1
+ ∝2
2
1
1
2
−1
1
+ ∝3
2
1
1
4
0
1
= −1
∝1
2
−1
1
2
1
1
+ ∝2
2
−1
1
2
−1
1
+ ∝3
2
−1
1
4
0
1
= −1
∝1
4
0
1
2
1
1
+ ∝2
4
0
1
2
−1
1
+ ∝3
4
0
1
4
0
1
= 1
Therefore,
6 ∝1+4 ∝2+9 ∝3=-1
4 ∝1+6 ∝2+9 ∝3=-1
9 ∝1+9 ∝2+17 ∝3=1
After solving the above equations, we get
∝1=-3.25
∝2= -3.25
∝3= 3.5
7/19/2024 12
Dr. Shivashankar, ISE, GAT
Cont…
Now let us find w, i.e.
𝒘 = ෍ ∝𝒊
ഥ
𝑺𝒊
𝑤 = −3.25
2
1
1
−3.25
2
−1
1
+ 3.5
4
0
1
W=
1
0
−3
Therefore, hyperplane equation, f(x)=w.x+b
So, w=
1
0
and offset or bias, b=-3
5. Plot hyperplane
7/19/2024 13
Dr. Shivashankar, ISE, GAT
Cont…
Since b=-3, a hyperplane is drawn +3 to the positive side and w is
1
0
, the hyperplane is
drawn parallel to y – axis.
Now let us clarify the new data points
2
−2
We know that
w.x+b ≥ 𝟎 −− −𝒃𝒆𝒍𝒐𝒏𝒈𝒔 𝒕𝒐 𝒄𝒍𝒂𝒔𝒔 + 𝟏
w.x+b < 𝟎 −− −𝒃𝒆𝒍𝒐𝒏𝒈𝒔 𝒕𝒐 𝒄𝒍𝒂𝒔𝒔 − 𝟏
Let us substitute the values in the above equation
Y= w.x+b
Y=
1
0
2
−2
− 𝟑
Y=2-0-3 =-1
Therefore, new data point
2
−2
belongs to class -1
7/19/2024 14
Dr. Shivashankar, ISE, GAT
Cont…
Proble-2:
Draw the hyperplane for the given data points Positively labelled data points
(3,1)(3,-1)(5,1)(5,-1) and Negatively labelled data points (1,0)(0,1)(0,-1)(-1,0)
using SVM and classifying the solution.
Solution:
𝑆𝑒𝑙𝑒𝑐𝑡𝑠 𝑡ℎ𝑒 𝑣𝑒𝑐𝑡𝑜𝑟 𝑠𝑢𝑝𝑝𝑜𝑟𝑡𝑠: 𝑆1 =
1
0
𝑆2 =
3
1
𝑆3 =
3
−1
Each vector is augmented with bias 1
So, 2. To provide vector representation, we need to add bias on all support vectors. Here
we assume bias=1. So, our support vector now become:
ҧ
𝑆1
1
0
1
ҧ
𝑆2
3
1
1
ҧ
𝑆3
3
−1
1
∝1=-3.5
∝2= 0.75
∝3= 0.75
W=
1
0
−3
, So, w=
1
0
and offset or bias, b=-2
7/19/2024 15
Dr. Shivashankar, ISE, GAT
Non-Linear SVM or Nonlinear Separable Case
• If data is linearly arranged, then we can separate it by using a straight line, but for non-
linear data, we cannot draw a single straight line.
• So to separate these data points, we need to add one more dimension. For linear
data, we have used two dimensions x and y, so for non-linear data, we will add a
third dimension z. It can be calculated as:
z=x2 +y2 -------(1)
• We must use a nonlinear SVM (i.e. we need to convert data from one feature space to
another). For nonlinear separable case:
• Φ1
𝑥1
𝑥2
=
4 − 𝑥2 + 𝑥1 − 𝑥2
4 − 𝑥1 + 𝑥1 − 𝑥2 𝑖𝑓 𝑥1
2
+ 𝑥2
2
> 2
𝑥1
𝑥2
𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
7/19/2024 16
Dr. Shivashankar, ISE, GAT
Fig. 11: Nonlinear data points Fig. 12: Added 3rd axis Fig. 11: After added 3rd axis, best
hyperplane for nonlinear SVM
Conti…
Problem 1: Draw the hyperplane for the given data points Positively labelled data points
(2,2)(2,-2)(-2,-2)(-2,2) and Negatively labelled data points (1,1)(1,-1)(-1,-1)(-1,1) using
nonlinear SVM and classifying the solution.
Solution:
1. Plot the graph
2. Nonlinear separable case:
• From the plotted graph, there is no hyperplane exists in the input space.
• We must use a nonlinear SVM (i.e. we need to convert data from one feature space to
another). For nonlinear separable case:
• Φ1
𝑥1
𝑥2
=
4 − 𝑥2 + 𝑥1 − 𝑥2
4 − 𝑥1 + 𝑥1 − 𝑥2 𝑖𝑓 𝑥1
2
+ 𝑥2
2
> 2
𝑥1
𝑥2
𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
7/19/2024 17
Dr. Shivashankar, ISE, GAT
Conti…
By applying nonlinear equation, convert the given data pints into other features.
So, positive examples are
𝟐
𝟐
,
𝟐
−𝟐
,
−𝟐
−𝟐
,
−𝟐
𝟐
-----
𝟐
𝟐
,
𝟏𝟎
𝟔
,
𝟔
𝟔
,
𝟔
𝟏𝟎
And negative examples are
𝟏
𝟏
,
𝟏
−𝟏
,
−𝟏
−𝟏
,
−𝟏
𝟏
-----
𝟏
𝟏
,
𝟏
−𝟏
,
−𝟏
−𝟏
,
−𝟏
𝟏
3. Now plot the graph for obtained new data points
Now we can classify easily identify the Support vectors
𝑺𝟏 =
𝟏
𝟏
, 𝑺𝟐 =
𝟐
𝟐
Each vector is augmented with 1 as bias input
ҧ
𝑆1
1
1
1
𝑎𝑛𝑑 ҧ
𝑆2
2
2
1
7/19/2024 18
Dr. Shivashankar, ISE, GAT
Conti..
According to Lagrange equation,
𝑤 = ෍
𝑖=1
𝑚
𝛼𝑖𝑥𝑖𝑦𝑖
Here, 𝑥𝑖 𝑎𝑛𝑑 𝑦𝑖 𝑎𝑟𝑒 𝑡ℎ𝑒 support vectors, 𝑆1 𝑎𝑛𝑑 𝑆2
Let us substitute support vectors in above equations
∝1
ഥ
𝑆1
ഥ
𝑆1+ ∝2
ഥ
𝑆1 𝑆2 = −1
∝1
ഥ
𝑆1 𝑆2+ ∝2 𝑆2 𝑆2 = 1
After substitute 𝑆1 and 𝑆2 values and simplified the above equations,
3∝1 +5 ∝2= −1
5∝1 +9 ∝2= −1
Therefore ,∝1= −7 𝑎𝑛𝑑 ∝2= 4
𝒘 = ෍ ∝𝒊
ഥ
𝑺𝒊
𝑤 = −7
1
1
1
+4
2
2
1
=
1
1
−3
.
Therefore, hyperplane y=wx+b, with w=
𝟏
𝟏
and bias =-3
7/19/2024 19
Dr. Shivashankar, ISE, GAT
Support Vector Machine Terminology
Hyperplane: The hyperplane tries that the margin between the closest points of
different classes should be as maximum as possible. In the case of linear
classifications, it will be a linear equation i.e. wx+b = 0.
Support Vectors: The closest data points to the hyperplane, which makes a critical
role in deciding the hyperplane and margin.
Margin: is the distance between the support vector and hyperplane. The main
objective of the SVM algorithm is to maximize the margin. The wider margin indicates
better classification performance.
Kernel: is the mathematical function, which is used in SVM to map the original input
data points into high-dimensional feature spaces. Some of the common kernel
functions are linear, polynomial and radial basis function(RBF).
Hard Margin: Also called as the maximum-margin hyperplane is a hyperplane that
properly separates the data points of different categories without any
misclassifications.
Soft Margin: When the data is not perfectly separable or contains outliers, SVM
permits a soft margin technique. It discovers a compromise between increasing the
margin and reducing violations.
Hinge Loss: A typical loss function in SVMs is hinge loss. It punishes incorrect
classifications or margin violations.
7/19/2024 20
Dr. Shivashankar, ISE, GAT
How Does Support Vector Machine Algorithm Work?
• The best way to understand the SVM algorithm is by the SVM classifier.
• This hyper-pane is chosen based on margin as the hyperplane providing the
maximum margin between the two classes is considered.
• These margins are calculated using data points known as Support Vectors.
Support Vectors are those data points that are near to the hyper-plane and
help in positioning data points it.
7/19/2024 21
Dr. Shivashankar, ISE, GAT
Cont…
The functioning of SVM classifier is to be understood mathematically then it can
be understood in the following ways-
Step 1: SVM algorithm predicts the classes. One of the classes is identified as 1
while the other is identified as -1.
Step 2: As all machine learning algorithms convert the business problem into a
mathematical equation involving unknowns. These unknowns are then found by
converting the problem into an optimization problem.
Step 3: This loss function can also be called a cost function whose cost is 0 when
no class is incorrectly predicted. If this is not the case, then error/loss is
calculated.
Step 4: As is the case with most optimization problems, weights are optimized by
calculating the gradients using advanced mathematical concepts of calculus viz.
partial derivatives.
Step 5: The gradients are updated only by using the regularization parameter
when there is no error in the classification while the loss function is also used
when misclassification happens.
7/19/2024 22
Dr. Shivashankar, ISE, GAT
Important Concepts in SVM
• Support vectors are those data points whose basis the margins are calculated
and maximized.
• The number of support vectors or the strength of their influence is one of the
hyper-parameters.
7/19/2024 23
Dr. Shivashankar, ISE, GAT
Fig. 2: Presents Support vectors, margin and Classes
Cont…
Hard Margin:
• Hard Margin refers to that kind of decision boundary that makes sure that all
the data points are classified correctly.
• While this leads to the SVM classifier not causing any error, it can also cause
the margins to shrink thus making the whole purpose of running an SVM
algorithm without results.
Soft Margin:
• Soft Margin SVM introduces flexibility by allowing some margin violations
(misclassifications) to handle cases where the data is not perfectly separable.
7/19/2024 24
Dr. Shivashankar, ISE, GAT
SVM Implementation in Python
In Python, an SVM classifier can be developed using the sklearn library.
Step 1: Load the important libraries
>> import pandas as pd
>> import numpy as np
>> import sklearn
>> from sklearn import svm
>> from sklearn.model_selection import train_test_split
>> from sklearn import metrics
Step 2: Import dataset and extract the X variables and Y separately.
>> df = pd.read_csv(“mydataset.csv”)
>> X = df.loc[:,[‘Var_X1’,’Var_X2’,’Var_X3’,’Var_X4’]]
>> Y = df[[‘Var_Y’]]
Step 3: Divide the dataset into train and test
>> X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.3,
random_state=123)
Step 4: Initializing the SVM classifier mode
>> svm_clf = svm.SVC(kernel = ‘linear’)
7/19/2024 25
Dr. Shivashankar, ISE, GAT
Cont…
Step 5: Fitting the SVM classifier model
>> svm_clf.fit(X_train, y_train)
Step 6: Coming up with predictions
>> y_pred_test = svm_clf.predict(X_test)
Step 7: Evaluating model’s performance
>> metrics.accuracy(y_test, y_pred_test)
>> metrics.precision(y_test, y_pred_test)
>> metrics.recall(y_test, y_pred_test)
7/19/2024 26
Dr. Shivashankar, ISE, GAT
Advantages & Disadvantages of SVM
Advantages
• It is one of the most accurate machine learning algorithms.
• It is a dynamic algorithm and can solve a range of problems, including linear and
non-linear problems, binary, binomial, and multi-class classification problems,
along with regression problems.
• SVM uses the concept of margins and tries to maximize the differentiation
between two classes; it reduces the chances of model overfitting, making the
model highly stable.
• SVM is known for its computation speed and memory management. It uses less
memory, especially when compared to machine vs deep learning algorithms with
whom SVM often competes.
Disadvantages:
• While SVM is fast and can work in high dimensions, it still fails in front of Naïve
Bayes, providing faster predictions in high dimensions. Also, it takes a relatively
long time during the training phase.
• Compared to other linear algorithms such as Linear Regression, SVM is not highly
interpretable, especially when using kernels that make SVM non-linear. Thus, it
isn’t easy to assess how the independent variables affect the target variable.
7/19/2024 27
Dr. Shivashankar, ISE, GAT
Cont…
Applications of SVM:
• Text categorization
• Semantic role labeling (predicate, agent, ..)
• Image classification
• Image segmentation
• Hand-written recognition
Characteristics of SVM
• Based on supervised learning methods
• Using for classification or regression analysis
• A non-probabilistic binary linear classifier
• Representation of the examples as points in space
• Examples of the separate categories are divided by a clear gap that is as
wide as possible.
• New examples are then mapped into that same space and predicted to
belong to a category based on the side of the gap on which they fall
• Performing linear classification.
7/19/2024 28
Dr. Shivashankar, ISE, GAT
K-Nearest Neighbour
• The k-Nearest Neighbors (KNN) algorithm is a non-parametric, supervised learning classifier,
which uses proximity to make classifications or predictions about the grouping of an individual
data point.
• It is one of the popular and simplest classification and regression classifiers used in machine
learning today.
• The nearest neighbors of an instance are defined in terms of the standard Euclidean Distance.
More precisely, let an arbitrary instance x be described by the feature vector
(𝑎1 𝑥 , 𝑎2 𝑥 , … … . , 𝑎𝑛(𝑥))
Distance between two instances 𝑥𝑖 and 𝑥𝑗 is defined to be d(𝒙𝒊, 𝒙𝒋 ), where,
𝑑 𝑥𝑖, 𝑥𝑗 ≡ ෍
𝑟=1
𝑛
𝑎𝑟 𝑥𝑖 − 𝑎𝑟 𝑥𝑗
2
The K-NN Real valued target function can be defined as:
f(x)=
σ𝒊=𝟏
𝒌
𝒘𝒊𝒇(𝒙𝒊)
σ𝒊=𝟏
𝒌 𝒘𝒊
Where, 𝒘𝒊 =
𝟏
𝒅 𝒙𝒒,𝒙𝒊
𝟐
7/19/2024 29
Dr. Shivashankar, ISE, GAT
Fig 2.1: K-NN example
Cont…
7/19/2024 30
Dr. Shivashankar, ISE, GAT
𝑬𝒄𝒍𝒊𝒅𝒆𝒂𝒏 𝑫𝒊𝒔𝒕𝒂𝒏𝒄𝒆 𝒃𝒆𝒕𝒘𝒆𝒆𝒏 𝑨 𝒂𝒏 𝑩 = 𝑿𝟐 − 𝑿𝟏
𝟐 + 𝒀𝟐 − 𝒀𝟏
𝟐
K-Nearest Neighbor (KNN) Algorithm for Machine Learning
• K-NN is one of the simplest Machine Learning algorithms based on Supervised
Learning technique.
• K-NN algorithm assumes the similarity between the new case/data and
available cases and put the new case into the category that is most similar to
the available categories.
• K-NN algorithm stores all the available data and classifies a new data point
based on the similarity. This means when new data appears then it can be
easily classified into a well suite category by using K- NN algorithm.
• K-NN algorithm can be used for Regression as well as for Classification but
mostly it is used for the Classification problems.
• K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
• It is also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
• KNN algorithm at the training phase just stores the dataset and when it gets
new data, then it classifies that data into a category that is much similar to the
new data.
7/19/2024 31
Dr. Shivashankar, ISE, GAT
How does K-NN work?
The K-NN working can be explained on the basis of the below algorithm:
Step-1: Select the number K of the neighbors
Step-2: Calculate the Euclidean distance of K number of neighbors
Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
Step-4: Among these k neighbors, count the number of the data points in each category.
Step-5: Assign the new data points to that category, number of the neighbor is maximum.
Step-6: Our model is ready.
Suppose we have a new data point and we need to put it in the required category.
Consider the below image:
7/19/2024 32
Dr. Shivashankar, ISE, GAT
Fig. 2.13: K-NN for best
classifier
Cont…
• Firstly, we will choose the number of neighbors, so we will choose the k value.
• Next, we will calculate the Euclidean distance between the data points. The
Euclidean distance is the distance between two points, which we have already
studied in geometry. It can be calculated as:
• Euc-dist[(𝑥1, 𝑦1); (𝑥2, 𝑦2)= 𝑥2 − 𝑥1
2 + 𝑦2 − 𝑦1
2
• By calculating the Euclidean distance we got the nearest neighbors, as three
nearest neighbors in category A and two nearest neighbors in category B.
Consider the below graph:
• As we can see the 3 nearest neighbors are from category A, hence this new
data point must belong to category A.
7/19/2024 33
Dr. Shivashankar, ISE, GAT
Why do we need a K-NN Algorithm?
• Suppose there are two categories, i.e., Category A and Category B, and we
have a new data point x1, so this data point will lie in which of these
categories.
• To solve this type of problem, we need a K-NN algorithm. With the help of K-
NN, we can easily identify the category or class of a particular dataset.
7/19/2024 34
Dr. Shivashankar, ISE, GAT
Fig. 11. Presents the importance of KNN
Advantages and Disadvantages of KNN Algorithm
Advantages:
• It is simple to implement.
• It is robust to the noisy training data
• It can be more effective if the training data is large.
Disadvantages:
• Always needs to determine the value of K which may be complex
some time.
• The computation cost is high because of calculating the distance
between the data points for all the training samples.
7/19/2024 35
Dr. Shivashankar, ISE, GAT
Applications of K-nearest Neighbor
1. Credit score
The KNN algorithm compares an individual's credit rating to others with comparable characteristics to help
calculate their credit rating.
2. Approval of the loan
The k-nearest neighbor technique, similar to credit scoring, is useful in detecting people who are more likely
to default on loans by comparing their attributes to those of similar people.
3. Preprocessing of data
Many missing values can be found in datasets. Missing data imputation is a procedure that uses the KNN
algorithm to estimate missing values.
4. Healthcare:
KNN has also had application within the healthcare industry, making predictions on the risk of heart attacks
and prostate cancer. The algorithm works by calculating the most likely gene expressions..
5. Prediction of stock prices
The KNN algorithm is useful in estimating the future value of stocks based on previous data since it has a
knack for anticipating the prices of unknown entities.
6. Recommendation systems
KNN can be used in recommendation systems since it can help locate people with comparable traits. It can
be used in an online video streaming platform, for example, to propose content that a user is more likely to
view based on what other users watch.
7. Computer Vision
For picture classification, the KNN algorithm is used. It's important in a variety of computer vision
applications since it can group comparable data points together, such as cats and dogs in separate classes.
8. Easy to implement:
Given the algorithm’s simplicity and accuracy, it is one of the first classifiers that a new data scientist will
learn.
7/19/2024 36
Dr. Shivashankar, ISE, GAT
Conti..
Problem 1: From the given dataset, find (x,y)= (170, 57) whether belongs to under or
normal weight. Assume K=3.
Solution:
Find the Euc-dist:d= 𝑥2 − 𝑥1
2 + 𝑦2 − 𝑦1
2
d1= 170 − 167 2 + 57 − 51 2
= 32 + 62 = 45 = 6.70
d2= 122 + 52 = 169 =13
And so on
7/19/2024 37
Dr. Shivashankar, ISE, GAT
Height (cm) Weight (kg) Class
167 51 Underweight
182 62 Normal
176 69 Normal
173 64 Normal
172 65 Normal
174 56 Underweight
169 58 Normal
173 57 Normal
170 55 Normal
170 57 ?
Conti..
Since K=3, with maximum 3 ranks
with distances.
The smallest distance is
• (169,58)-1.414: Normal
• (170,55)-2: Normal
• (173,57)-3:Normal
Hence all 3 points, so (170,57)belongs
to normal class,
7/19/2024 38
Dr. Shivashankar, ISE, GAT
Height (cm) Weight (kg) Class Distance
167 51 Underweight 6.7
182 62 Normal 13
176 69 Normal 13.4
173 64 Normal 7.6
172 65 Normal 8.2
174 56 Underweight 4.1
169 58 Normal 1.414-1(R)
173 57 Normal 3-3(R)
170 55 Normal 2-2(R)
170 57 Normal 3
Conti..
Problem 2: From the given dataset, find (x,y)= (157, 54) whether belongs to medium or
longer. Assume K=3.
Solution:
Find the Euc-dist:d= 𝑥2 − 𝑥1
2 + 𝑦2 − 𝑦1
2
7/19/2024 39
Dr. Shivashankar, ISE, GAT
Sl. No. Height Weight Target
1 150 50 Medium
2 155 55 Medium
3 160 60 Longer
4 161 59 Longer
5 158 65 Longer
6 157 54 ?
Sl. No. Height Weight Target Distance
1 150 50 Medium 8.06
2 155 55 Medium 2.24 (1)
3 160 60 Longer 6.71(3)
4 161 59 Longer 6.40(2)
5 158 65 Longer 11.05
6 157 54 ?
Conti..
From the table and K=3, with maximum 3 ranks with distances
We have 2.24 (medium), 6.40(Longer) and 6.71(Longer)
f(𝑥𝑣) =
f(𝑥𝑣) = angmax
𝑣𝜖𝑉
෍
𝑖=1
𝑘
𝛿 𝑣, 𝑓(𝑥𝑣 ) −− −𝛿 𝑎, 𝑏 = 1 𝑖𝑓 𝑎 == 𝑏
𝛿 𝑎, 𝑏 = 0 𝑖𝑓 𝑎 ≠ 𝑏
Compare medium with 2.24(m), 6.40(L) and 6.71(L)
==𝛿 𝑀, 𝑀 + 𝛿 𝑀, 𝐿 + 𝛿 𝑀, 𝐿
1+0+0=1
Compare longer with 2.24(m), 6.40(L) and 6.71(L)
==𝛿 𝐿, 𝑀 + 𝛿 𝐿, 𝐿 + 𝛿 𝐿, 𝐿
0+1+1=2
Since 2 is longer, (157,54)belong to longer
If we consider the distance 2.24, 6.71 and 6.40, -----2.24 is smaller, hence medium could be consider.
Distance weighted NN:
1. Discrete valued target function
2. Real valued target function
7/19/2024 40
Dr. Shivashankar, ISE, GAT
Conti..
Discrete valued function:
f(𝑥𝑣) = angmax
𝑣𝜖𝑉
෍
𝑖=1
𝑘
𝑤𝑖𝛿 𝑣, 𝑓(𝑥𝑖 )
Where, 𝑤𝑖 =
1
𝑑 𝑥𝑞,𝑥𝑖
2
W.r.t. medium:
f(𝑥𝑞) =0.199*𝛿 𝑚, 𝑚 + 0.022∗𝛿 𝑚, 𝑙 +
0.024*𝛿 𝑚, 𝑙
=0.199*1 + 0.022∗0+0.024∗0= 0.199
W.r.t. Longer:
f(𝑥𝑞) =0.199*𝛿 𝑙, 𝑚 + 0.022∗𝛿 𝑙, 𝑙 +
0.024*𝛿 𝑙, 𝑙
= 0.199*0 + 0.022∗1+0.024∗1= 0.046
Since 0.199 > 0.046—new instance is
Classified to medium.
7/19/2024 41
Dr. Shivashankar, ISE, GAT
Sl.
No.
Height Weight Target Distance 1
𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 2
1 150 50 Mediu
m
8.06
2 155 55 Mediu
m
2.24 (1) 0.199
3 160 60 Longer 6.71(3) 0.022
4 161 59 Longer 6.40(2) 0.024
5 158 65 Longer 11.05
6 157 54 Mediu
m
Conti..
Real valued target function:
f(x)=
σ𝑖=1
𝑘
𝑤𝑖𝑓(𝑥𝑖)
σ𝑖=1
𝑘 𝑤𝑖
Where, 𝑤𝑖 =
1
𝑑 𝑥𝑞,𝑥𝑖
2 weighted vectors-randomly we will consider
f(𝑥𝑞) =
(0.199∗1.2+0.022∗1.8+0.024∗2.1)
0.45+0.15+0.16
=1.51
7/19/2024 42
Dr. Shivashankar, ISE, GAT
Sl. No. Height Weight Target Distance 1
𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 2
1 150 50 1.5 8.06
2 155 55 1.2 2.24 (1) 0.199
3 160 60 1.8 6.71(3) 0.022
4 161 59 2.1 6.40(2) 0.024
5 158 65 1.7 11.05
6 157 54 1.5
Conti..
Problem 3: Calculate the centroid classifier for the give data and the given a test instance
(6,5), predict the class.
Solution:
• Step1: Compute the mean/centroid of each class.
• There are 2 classes, A & B.
• Centroid of class A=(3+5+4,1+2+3)/3=(12,6)/3=(4,2)
• Centroid of class B=(7+6+8,6+7+5)/3=(21,18)/3=(7,6)
• Step 2: calculate the Euclidean distance between test instance (6,5) and each of the
centroid.
7/19/2024 43
Dr. Shivashankar, ISE, GAT
X Y Class
3 1 A
5 2 A
4 3 A
7 6 B
6 7 B
8 5 B
Conti..
Euc-dist[(𝑥1, 𝑦1); (𝑥2, 𝑦2)= 𝑥2 − 𝑥1
2 + 𝑦2 − 𝑦1
2
Class A : [(6,5);(4,2)] = 4 − 6 2 + 2 − 5 2 = 3.6
Class B: [(6,5);(7,6)] = 7 − 6 2 + 6 − 5 2 =1.414
The test instance has smaller distance to class B
Hence, the class of this test instance is predicted as B.
Problem 4. Given the following training instances in the table, each having two attributes
(x1 and x2). Compute the class label for test instance 𝑡1 = 3,7 , using 3 nearest neighbors
(k=3).
7/19/2024 44
Dr. Shivashankar, ISE, GAT
Training
Instances
𝑥1 𝑥2 Output
𝐼1 7 7 0
𝐼2 7 4 0
𝐼3 3 4 1
𝐼4 1 4 1
Conti..
Euc-dist[(𝑥1, 𝑦1); (𝑥2, 𝑦2)=d= 𝑥1 − 𝑦1
2 + 𝑥2 − 𝑦2
2
d= 𝑥1 − 𝑦1
2 + 𝑥2 − 𝑦2
2 Neighbor rank
𝑑1 = 7 − 3 2 + 7 − 7 2 =4 3
𝑑2 = 7 − 3 2 + 4 − 7 2 = 5 4
𝑑3 = 3 − 3 2 + 4 − 7 2 = 3 1
𝑑4 = 1 − 3 2 + 4 − 7 2 = 3.6 2
For K=3, we will consider 𝐼1 = 3, 𝐼3 = 1,and 𝐼4 = 2
So K=3, 𝑡2=(3,7) -----output is 1
Highest vote=0.11,
so output =1
7/19/2024 45
Dr. Shivashankar, ISE, GAT
d 𝑑2 Vote
=1/𝑑2
Rank
4 16 1/16=0.06 3
5 25 1/25=0.04 4
3 9 1/9=0.11 1
3.6 12.96 1/12.96=0
.08
2
Conti..
Problem 5: Apply KNN classifier to predict the diabetic patience with the given features
BMI, Age. If the training examples are: Assume K=3, Test example: BMI=43.6, Age=40,
Sugar=?
7/19/2024 46
Dr. Shivashankar, ISE, GAT
BMI Age Sugar
33.6 50 1
26.6 30 0
23.4 40 0
43.1 67 0
35.3 23 1
35.9 67 1
36.7 45 1
25.7 46 0
23.3 29 0
31 56 1
Conti..
Solution:
First calculate the distance between the test instances and training instance: Test examples: BMI=43.6.
Age=40, sugar=?
Euc-dist=d= 𝑥2 − 𝑥1
2 + 𝑦2 − 𝑦1
2 , 𝒅𝟏 = 43.6 − 33.6 2 + 40 − 50 2 = 14.14
Therefore, for test examples: BMI=43.6, Age=40, sugar=1, because in the rank 1, sugar=1
7/19/2024 47
Dr. Shivashankar, ISE, GAT
BMI Age Sugar Distance to new Rank
33.6 50 1 14.14 2
26.6 30 0 19.72 5
23.4 40 0 20.20 6
43.1 67 0 27.00 9
35.3 23 1 18.92 4
35.9 67 1 28.08 10
36.7 45 1* 8.52 1
25.7 46 0 18.88 3
23.3 29 0 23.09 8
31 56 1 20.37 7
Cont…
Problem 6: given the training data, predict the class of the following new examples using KNN for K=5,
age<=30, income = medium, student=yes, credit rating=fair.
7/19/2024 48
Dr. Shivashankar, ISE, GAT
Age Income Student Credit
rating
Buys
computers
<=30 High No Fair No
<=30 High No Excellent No
30..40 High No Fair Yes
>40 Medium No Fair Yes
>40 Low Yes Fair Yes
>40 Low Yes Excellent No
31..40 Low Yes Excellent Yes
<=30 Medium No Fair no
<=30 Low Yes Fair Yes
>40 Medium Yes Fair Yes
<=30 Medium Yes Excellent Yes
31..40 Medium No Excellent Yes
31..40 High Yes Fair Yes
>40 Medium no Excellent No
Cont…
Solution:
• For similarity measures, use a single match of attribute values:
• σ𝑖=1
4
𝑤𝑖 ∗
𝜕 𝑎𝑖,𝑏𝑖
4
• Where, 𝜕 𝑎𝑖, 𝑏𝑖 =1 if 𝑎𝑖 = 𝑏𝑖 and
• =0 otherwise.
• 𝑎𝑖𝑎𝑛𝑑 𝑏𝑖 are either age, income, stude or credit rating
• Weight are all 1 except for income it is 2.
• Now, new examples using KNN for K=5, age<=30, income = medium, student=yes,
credit rating=fair.
• For RID=1 class=no, distance to new:
(1*1+2*0+1*0+1*1)/4=0.5
7/19/2024 49
Dr. Shivashankar, ISE, GAT
Age<=30 from the
table
Age<=30 from the
given new examples
Income-high from
the table Income-medium
Student-no from
the table
Student-yes
Credit rating-fair
from the table
Credit rating-fair
from new example
Cont…
7/19/2024 50
Dr. Shivashankar, ISE, GAT
Age Income Student Credit rating Buys
computers
RID class distance
<=30 High No Fair No 1 No 0.5
<=30 High No Excellent No 2 No 0.25
30..40 High No Fair Yes 3 Yes 0.25
>40 Medium No Fair Yes* 4 Yes 0.75
>40 Low Yes Fair Yes 5 Yes 0.5
>40 Low Yes Excellent No 6 No 0.25
31..40 Low Yes Excellent Yes 7 Yes 0.25
<=30 Medium No Fair No 8 No 1
<=30 Low Yes Fair Yes* 9 Yes 0.75
>40 Medium Yes Fair Yes* 10 Yes 1
<=30 Medium Yes Excellent Yes* 11 Yes 1
31..40 Medium No Excellent Yes 12 Yes 0.5
31..40 High Yes Fair Yes 13 Yes 0.5
>40 Medium no Excellent No 14 No 0.5
Cont…
• Therefore, among the five nearest neighbors (RID and distance
values: 4-0.75,8-1,9—0.75,10-1,11-1), four are from class Yes and
one from class No.
• Hence, the KNN-classifier, buy computers=yes.
7/19/2024 51
Dr. Shivashankar, ISE, GAT
Clustering K-means
• The task of grouping data points based on their similarity with each other is
called Clustering or Cluster Analysis.
• This method is defined under the branch of Unsupervised Learning, which
aims at gaining insights from unlabelled data point
• Cluster analysis divides the data into groups (clusters) that are meaningful,
useful, or both.
• For instance, clustering can be regarded as a form of classification in that it
creates a labeling of objects with class (cluster) labels.
7/19/2024 52
Dr. Shivashankar, ISE, GAT
K-means
• K-Means Clustering is an unsupervised learning algorithm that is used to solve the clustering
problems in machine learning or data science.
• K means clustering, assigns data points to one of the K clusters depending on their distance
from the center of the clusters.
• It starts by randomly assigning the clusters centroid in the space.
• Then each data point assign to one of the cluster based on its distance from centroid of the
cluster.
• After assigning each point to one of the cluster, new cluster centroids are assigned.
• This process runs iteratively until it finds good cluster.
• Here, K defines the number of pre-defined clusters that need to be created in the process,
as if K=2, there will be two clusters, and for K=3, there will be three clusters, and so on.
• Hence, each cluster has data points with some commonalities/similarities, and it is away
from other clusters.
7/19/2024 53
Dr. Shivashankar, ISE, GAT
The Basic K-means Algorithm
• First, we randomly initialize k points, called means or cluster centroids.
• We categorize each item to its closest mean, and we update the mean’s coordinates,
which are the averages of the items categorized in that cluster so far.
• We repeat the process for a given number of iterations and at the end, we have our
clusters.
Basic K-means algorithm
Step-1: Select the number K (clusters) randomly to decide the number of clusters.
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined K
clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest
centroid of each cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
Step-7: The model is ready.
7/19/2024 54
Dr. Shivashankar, ISE, GAT
Strengths and Weaknesses
Strength
• K-means is simple and can be used for a wide variety of data types.
• It is also quite efficient, even though multiple runs are often performed.
• This algorithm is very easy to understand and implement.
• This algorithm is efficient, Robust, and Flexible
• If data sets are distinct and spherical clusters, then give the best result
Weaknesses
• This algorithm needs prior specification for the number of cluster centers that is the
value of K.
• It cannot handle outliers and noisy data, as the centroids get deflected
• It does not work well with a very large set of datasets as it takes huge computational
time.
7/19/2024 55
Dr. Shivashankar, ISE, GAT
Cont…
Problem 1: Divide the given sample data into two clusters [2] using K means algorithm
S={2,3,4,10,11,12,20,25,30}. Given K=2, for new data point 15, identify the cluster belongs
to.
Solution:
1. Choose 2 random clusters from the given data sets C1=4, C2=12.
2. Find the distance between given samples and centroids, put the sample in the nearest
cluster.
3. Repeat the same for all data points.
Cluster k1={2,3,4} -------------(2-4=2, 3-4=1, 4-4=0, 10-4=6,……..
2-12=10, 3-12=9, 5-12=7, 10-12=2,……..
so 2,3 and 4 are placed in cluster 1 as its distance is nearest
to C1=4 and
Cluster K2={10,11,12,20,25,30}
4. Compute new centroids
K1={2,3,4} K2={10,11,12,20,25,30}
C1={2+3+4/3}=3 C2={10+11+12+20+25+30}/6=18
So C1=3 C2=18
7/19/2024 56
Dr. Shivashankar, ISE, GAT
Cont…
5. Find new clustering C1=3 and C2=18
K1={2,3,4,10} K2={11,12,20,25,30}
C1=2+3+4+10/4 =4.75 K2=11+12+20+25+30/5=19.6
6. Find new clustering C1=4.75 and C2=19.6
K1={2,3,4,10,11,12} K2-{20,25,30}
C1=2+3+4+10+11+12/6=7 C2=20+25+30/3=25
7. Find new clustering C1=7 and C2=25
K1={2,3,4,10,11,12} K2-{20,25,30}
Since clustering and centroid values remains same.
So the given dataset is dividing into 2 clusters as
K1={2,3,4,10,11,12} K2-{20,25,30}
With centroids C1=7 and C2=25.
8. Identify the cluster for new data points 15
Distance between 15 and C1(15-7)=8
Distance between 15 and C2(15-25)=10
Since distance between 15 and C1 is less, new data point 15 belongs to C1(=7).
7/19/2024 57
Dr. Shivashankar, ISE, GAT
Cont…
Problem 2: Divide the following data points into two clusters using K-mean and identify (5,4) belongs
to which cluster.
Solution:
Step 1: Choosing randomly 2 clusters centers
C1=(2,1) and C2=(2,3)
Step 2: Finding distance between two clusters centers and each data point (Apply Euclidean distance)
For data points, (1,1) and C1(2,1): d= 1 − 2 2 + 1 − 1 2 = 1
(2,1) and (2,1): d= 2 − 2 2 + 1 − 1 2 = 0
(2,3) and (2,1): d= 2 − 2 2 + 3 − 1 2 = 2 and so on
7/19/2024 58
Dr. Shivashankar, ISE, GAT
X 1 2 2 3 4 5
Y 1 1 3 2 3 5
Data points Distance from C1
(2,1)
Distance from C2(2,3) New clusters
(1,1) 1 2.24 C1
(2,1) 0 2 C1
(2,3) 2 0 C2
(3,2) 1.41 1.41 C1
(4,3) 2.83 2 C2
(5,5) 5 3.61 C2
Cont…
Step 3: cluster 1 of C1={ (1,1), (2,1), (3,2)}
cluster 2 of C2={ (2,3), (4,3), (5,5)}
Step 4: Recalculate cluster center
C1=
1
3
[(1,1)+(2,1)+(3,2)]=
1
3
[6,4]= (2,1.33)
C2=
1
3
[(2,3)+(4,3)+(5,5)]=
1
3
[11,11]= (3.67,3.67)
Step 5: Repeat the step 2 until we get same cluster center or same cluster elements
7/19/2024 59
Dr. Shivashankar, ISE, GAT
Data points Distance from
C1(2,1.33)
Distance from
C2(3.67,3.67)
New clusters
(1,1) 1.05 3.78 C1
(2,1) 0.33 3.15 C1
(2,3) 1.67 1.8 C1
(3,2) 1.204 1.8 C1
(4,3) 2.605 0.75 C2
(5,5) 4.74 1.88 C2
Cont…
cluster 1 of C1={ (1,1), (2,1),(2,3), (3,2)}
cluster 2 of C2={ (4,3), (5,5)}
Step 6: Recalculate cluster center
C1=
1
4
[(1,1)+(2,1)+(2,3)+(3,2)]=
1
4
[8,7]= (2,1.75)
C2=
1
2
[(4,3)+(5,5)]=
1
2
[9,8]= (4.5,4)
Step 7: Repeat the step 2 until we get same cluster center or same cluster elements
Step 8: cluster 1 of C1={ (1,1), (2,1),(2,3), (3,2)}
cluster 2 of C2={ (4,3), (5,5)}
Since cluster elements are same as compared to previous iteration, stop.
7/19/2024 60
Dr. Shivashankar, ISE, GAT
Data points Distance from C1(2,1.75) Distance from C2(4.5,4) New clusters
(1,1) 1.25 4.61 C1
(2,1) 0.75 3.9 C1
(2,3) 1.25 2.69 C1
(3,2) 1.03 2.5 C1
(4,3) 2.36 1.12 C2
(5,5) 4.42 1.12 C2
Cont…
Problem 3 Use K-means clustering to cluster the following data into two groups. Data
points {2,4,10,12,3,20,30,11,25}, initial cluster centroids are M1=4 and M2=11.
Solution: Initial centroids: M1=4, M2=11.
Distance to is calculated by d(𝑥2, 𝑥1) = 𝑥2 − 𝑥1
2
Therefore, C1={2,4,3}
M1=(2+4+3)/3=3
C2={10,12,20,30,11,25}
M2=(10+12+20+30+11+25)/6=18
so new centroids: M1=3, M2=18
7/19/2024 61
Dr. Shivashankar, ISE, GAT
Data
points
Distance to Cluster New
cluster
M1(4) M2(11)
2 2 9 C1
4 0 7 C1
10 6 1 C2
12 8 1 C2
3 1 8 C1
20 16 9 C2
30 26 19 C2
11 7 0 C2
25 21 14 C2
Cont…
Current centroids: M1=3, M2=18
Therefore, C1={2,4,20,3}
C2={12,20,30,11,25}
So,
New centroids: M1=4.75
M2=19.6
7/19/2024 62
Dr. Shivashankar, ISE, GAT
Data
points
Distance to Cluster New
cluster
M1 M2
2 1 16 C1 C1
4 1 14 C1 C1
10 7 8 C2 C1
12 9 6 C2 C2
3 0 15 C1 C1
20 17 2 C2 C2
30 27 12 C2 C2
11 8 7 C2 C2
25 22 7 C2 C2
Cont…
Current centroids: M1=4.75, M2=19.6
Therefore, C1={2,4,10,11,12,3}
C2={20,30,25}
So,
New centroids: M1=7
M2=25
7/19/2024 63
Dr. Shivashankar, ISE, GAT
Data
points
Distance to Cluster New
cluster
M1 M2
2 2.75 17.6 C1 C1
4 0.75 15.6 C1 C1
10 5.25 9.6 C1 C1
12 7.25 7.6 C2 C1
3 1.75 16.6 C1 C1
20 15.25 0.4 C2 C2
30 25.25 10.4 C2 C2
11 6.25 8.6 C2 C1
25 20.25 5.4 C2 C2
Cont…
Current centroids: M1=7, M2=25
Therefore, final cluster are
• C1=(2,4,10,11,12,13}
• C2={20,30,5}
7/19/2024 64
Dr. Shivashankar, ISE, GAT
Data
points
Distance to Cluster New
cluster
M1 M2
2 5 23 C1 C1
4 3 21 C1 C1
10 3 15 C1 C1
12 5 13 C1 C1
3 4 22 C1 C1
20 13 5 C2 C2
30 23 5 C2 C2
11 4 14 C1 C1
25 18 0 C2 C2
Cont…
Problem 4: Use K-means clustering to cluster and suppose that the data mining task is to cluster
points into 3 cluster. Where the data points are A1(2,10), A2(2,5), A3(8,4), B1(5,8), B2(7,5), B3(6,4)
and C1(1,2), c2(4,9). Suppose initially we assign A1, B1 and C1 as the center of each cluster
respectively.
Solution: Initial centroids: A1=(2,10), B1=(5,8), C1=(1,2)
Distance to is calculated by d(𝑃1, 𝑃2) = 𝑥2 − 𝑥1
2 + 𝑦2 − 𝑦1
2
Therefore, C1={2,10}
C2={(8,5,7,6,4) (4,8,5,4,9)}
C3={(2,1)(5,2)}
So new centroids:
A1=(2,10), B1=(6,6) and
C1=(1.5,3.5)
7/19/2024 65
Dr. Shivashankar, ISE, GAT
Data points Distance to Cluster New
cluster
2 10 5 8 1 2
A1 2 10 0 3.61 8.06 1
A2 2 5 5 4.24 3.16 3
A3 8 4 8.49 5 7.28 2
B1 5 8 3.61 0 7.21 2
B2 7 5 7.07 3.61 6.71 2
B3 6 4 7.21 4.12 5.39 2
C1 1 2 8 7.21 0 3
C2 4 9 2.24 1.41 7.62 2
Cont…
Current centroids: A1=(2,10), B1=(6,6), C1=(1.5,3.5)
Therefore, C1={2,4) (10,9)}
C2={(8,5,7,6) (4,8,5,4)}
C3={(2,1)(5,2)}
So new centroids:
A1=(3,9.5),
B1=(6.5,5.25) and
C1=(1.5,3.5)
7/19/2024 66
Dr. Shivashankar, ISE, GAT
Data points Distance to Cluster New
cluster
2 10 6 6 1.5 3.5
A1 2 10 0 5.66 6.52 1 1
A2 2 5 5 4.12 1.58 3 3
A3 8 4 8.49 2.83 6.52 2 2
B1 5 8 3.61 2.24 5.7 2 2
B2 7 5 7.07 1.41 5.7 2 2
B3 6 4 7.21 2.00 4.53 2 2
C1 1 2 8.06 6.46 1.58 3 3
C2 4 9 2.24 3.61 6.04 2 1
Cont…
Current centroids: A1=(3,9.5), B1=(6.5,5.25), C1=(1.5,3.5)
Therefore, the new centroids:
A1=(3.6, 7.9),
B1=(7,4.33) and
C1=(1.5,3.5)
7/19/2024 67
Dr. Shivashankar, ISE, GAT
Data points Distance to Cluster New
cluster
3 9.5 6.5 65.
25
1.5 3.5
A1 2 10 1.12 6.54 6.52 1 1
A2 2 5 4.61 4.51 1.58 3 3
A3 8 4 7.43 1.95 6.52 2 2
B1 5 8 2.5 3.13 5.7 2 1
B2 7 5 6.02 0.56 5.7 2 2
B3 6 4 6.26 1.35 4.53 2 2
C1 1 2 7.76 6.39 1.58 3 3
C2 4 9 1.12 4.51 6.04 1 1
Cont…
Current centroids: A1=(3.6, 7.9), B1=(7,4.33), C1=(1.5,3.5)
Therefore, the final clusters:
C1={(2,5,4)(10,8,9)},
C2={(8,7,6)(4,5,4)}
C3={(2,1)(5,2)}
7/19/2024 68
Dr. Shivashankar, ISE, GAT
Data points Distance to Cluster New
cluster
3.
6
7.9 7 4.3
3
1.5 3.5
A1 2 10 1.94 7.56 6.52 1 1
A2 2 5 4.33 5.04 1.58 3 3
A3 8 4 6.62 1.05 6.52 2 2
B1 5 8 1.67 4.18 5.70 1 1
B2 7 5 5.21 0.67 5.70 2 2
B3 6 4 5.52 1.05 4.53 2 2
C1 1 2 7.49 6.44 1.58 3 3
C2 4 9 0.33 5.55 6.04 1 1
Hierarchical Clustering
• Hierarchical clustering is another unsupervised machine learning algorithm,
which is used to group the unlabeled datasets into a cluster.
• It is a connectivity-based clustering model that groups the data points
together that are close to each other based on the measure of similarity or
distance.
• The assumption is that data points that are close to each other are more
similar or related than data points that are farther apart.
• It is based on the idea of creating a hierarchy of clusters, where each cluster
is made up of smaller clusters that can be further divided into even smaller
clusters.
• This hierarchical structure makes it easy to visualize the data and identify
patterns within the data.
Hierarchical clustering is of two types.
Agglomerative clustering
Divisive clustering
7/19/2024 69
Dr. Shivashankar, ISE, GAT
Agglomerative Clustering
• Agglomerative clustering is a type of data clustering method used in
unsupervised learning.
• It begins with N groups, each containing initially one entity, and then the two
most similar groups merge at each stage until there is a single group
containing all the data.
• It is an iterative process that groups similar objects into clusters based on
some measure of similarity.
• It uses a bottom-up approach for dividing data points into clusters.
• The algorithm begins by assigning each object to its own cluster.
• It then uses a distance metric to determine the similarity between objects and
clusters.
• If two clusters have similar elements, they are merged together into a larger
cluster.
• This continues until all objects are grouped into one final cluster.
7/19/2024 70
Dr. Shivashankar, ISE, GAT
Agglomerative Hierarchical Clustering Algorithm
• Step 1: Consider each dataset as a single cluster and calculate the distance of
one cluster from all the other clusters.
• Step 2: In the second step, comparable clusters are merged together to form
a single cluster. Let’s say cluster (B) and cluster (C) are very similar to each
other, therefore we merge them in the second step similarly to cluster (D)
and (E) and at last, we get the clusters [(A), (BC), (DE), (F)]
• Step 3: We recalculate the proximity according to the algorithm and merge
the two nearest clusters([(DE), (F)]) together to form new clusters as [(A),
(BC), (DEF)]
• Step 4: Repeating the same process; The clusters DEF and BC are comparable
and merged together to form a new cluster. We’re now left with clusters [(A),
(BCDEF)].
• Step 4: At last, the two remaining clusters are merged together to form a
single cluster [(ABCDEF)].
7/19/2024 71
Dr. Shivashankar, ISE, GAT
Cont…
The average linkage clustering uses the average formula, i.e. distance between two
clustering A & B
d(A,B)=avg{d(a,y): x𝜖𝐴, 𝑦𝜖𝐵}
d(A,B)=
∈𝑑 𝑥,𝑦 :x𝜖𝐴,𝑦𝜖𝐵
𝐴 𝐵
7/19/2024 72
Dr. Shivashankar, ISE, GAT
Fig.9. Concept of Agglomerative Clustering
Key Issues in Hierarchical Clustering
Lack of a Global Objective Function:
• Agglomerative hierarchical clustering techniques use various criteria to decide locally,
at each step, which clusters should be merged (or split for divisive approaches). This
approach yields clustering algorithms that avoid the difficulty of attempting to solve a
hard combinatorial optimization problem.
• Do not have problems with local minima or difficulties in choosing initial points.
Ability to Handle Different Cluster Sizes:
• There are two approaches: weighted, which treats all clusters equally, and unweighted,
which takes the number of points in each cluster into account.
• Treating clusters of unequal size equally gives different weights to the points in
different clusters, while taking the cluster size into account gives points in different
clusters the same weight.
Merging Decisions are Final:
• Agglomerative hierarchical clustering algorithms tend to make good local decisions
about combining two clusters since they can use information about the pairwise
similarity of all points.
• This approach prevents a local optimization criterion from becoming a global
optimization criterion.
7/19/2024 73
Dr. Shivashankar, ISE, GAT
Advantage and disadvantages of Agglomerative Hierarchical
Clustering Algorithm
Advantages
1. Performance: It is effective in data observation from the data shape and returns accurate results
2. Easy: It is easy to use and provides better user guidance with good community support. So much
content and good documentation are available for a better user experience.
3. More Approaches: Two approaches are there using which datasets can be trained and tested,
agglomerative and divisive.
4. Performance on Small Datasets: The hierarchical clustering algorithms are effective on small
datasets and return accurate and reliable results with lower training and testing time.
Disadvantages
1. Time Complexity: As many iterations and calculations are associated, the time complexity of
hierarchical clustering is high. In some cases, it is one of the main reasons for preferring K-Means
clustering.
2. Space Complexity: As many calculations of errors with losses are associated with every epoch, the
space complexity of the algorithm is very high. Due to this, while implementing the hierarchical
clustering, the space of the model is considered. In such cases, we prefer K-Means clustering.
3. Poor performance on Large Datasets: When training a hierarchical clustering algorithm for large
datasets, the training process takes so much time with space which results in poor performance of the
algorithms.
7/19/2024 74
Dr. Shivashankar, ISE, GAT
Exercise problems
Problem 1: Consider the following set of 6 one dimensional data points : 18,22,25,
42,27,43. merge the clusters using minimum distance and update proximity matrix
accordingly. Show proximity matrix to each iteration.
Solution:
Since minimum distance is 1—(42,43) or (43,42), so ,merge 42 and 43
From matrix 2, since 2 is minimum distance, merge (25,27)
7/19/2024 75
Dr. Shivashankar, ISE, GAT
18 22 25 27 42 43
18 0 4 7 9 24 25
22 4 0 3 5 20 21
25 7 3 0 2 17 18
27 9 5 2 0 15 16
42 24 20 17 15 0 1
43 25 21 18 16 1 0
18 22 25 27 42,43
18 0 4 7 9 24
22 4 0 3 5 20
25 7 3 0 2 17
27 9 5 2 0 15
42,43 24 20 17 15 0
Exercise problems
Since 3 is minimum distance, merge 22,25.and 27---{22,(25,27)}
Since 4 is minimum distance, merge 18,22,25,27---[18,{22,(25,27)}]
Draw the dendrogram for the merged data points.
7/19/2024 76
Dr. Shivashankar, ISE, GAT
18 22 25,27 42,43
18 0 4 7 24
22 4 0 3 20
25,27 7 3 0 15
42,43 24 20 15 0
18 22,25,27 42,43
18 0 4 24
22,25,27 4 0 15
42,43 24 15 0
Problems
Problem 2: For the given dataset, find the clusters using a single link technique. Use
Euclidean distance and draw the dendrogram.
Solution:
Step 1: Compute the distance matrix using Euclidean distance.
Let A(𝑥1, 𝑦1) 𝑎𝑛𝑑 B(𝑥2, 𝑦2)
Then Euclidean distance between two points
d(A,B)= x2 − x1
2 + y2 − y1
2
7/19/2024 77
Dr. Shivashankar, ISE, GAT
Sample No X Y
P1 0.40 0.53
P2 0.22 0.38
P3 0.35 0.32
P4 0.26 0.19
P5 0.08 0.41
P6 0.45 0.30
Conti..
d(P1,P2)= 𝟎. 𝟐𝟐 − 𝟎. 𝟒𝟎 𝟐 + 𝟎. 𝟑𝟖 − 𝟎. 𝟓𝟑 𝟐 = 0.23
d(P1,P3)= 𝟎. 𝟑𝟓 − 𝟎. 𝟒𝟎 𝟐 + 𝟎. 𝟑𝟐 − 𝟎. 𝟓𝟑 𝟐 = 0.22
d(P2,P3)= 𝟎. 𝟑𝟓 − 𝟎. 𝟐𝟐 𝟐 + 𝟎. 𝟑𝟐 − 𝟎. 𝟑𝟖 𝟐 = 0.14 and so on
Step 2: Merging the two closest members
Here, the minimum values is 0.10 and hence we combine P3 and P6 (as 0.10 came in the P6 row and
p3 column).
Now, form the clusters of elements corresponding to the minimum value and update the distance
matrix.
7/19/2024 78
Dr. Shivashankar, ISE, GAT
P1 P2 P3 P4 P5 P6
P1 0
P2 0.23 0
P3 0.22 0.14 0
P4 0.37 0.19 0.13 0
P5 0.34 0.14 0.28 0.23 0
P6 0.24 0.24 0.10 0.22 0.39 0
Conti..
(P3,P6)
Merge two closest members of the two clusters. The minimum value is 0.13 and hence we combine
P3, P6, P4
{(P3, P6), P4}
7/19/2024 79
Dr. Shivashankar, ISE, GAT
P1 P2 P3 P4 P5 P6
P1 0
P2 0.23 0
P3 0.22 0.14 0
P4 0.37 0.19 0.13 0
P5 0.34 0.14 0.28 0.23 0
P6 0.24 0.24 0.10 0.22 0.39 0
P1 P2 P3,P6 P4 P5
P1 0
P2 0.23 0
P3,P6 0.22 0.14 0
P4 0.37 0.19 0.13 0
P5 0.34 0.14 0.28 0.23 0
P1 P2 P3,P6 P4 P5
P1 0
P2 0.23 0
P3,P6 0.22 0.14 0
P4 0.37 0.19 0.13 0
P5 0.34 0.14 0.28 0.23 0
P1 P2 P3,P6,P4 P5
P1 0
P2 0.23 0
P3,P6,P4 0.22 0.14 0
P5 0.34 0.14 0.28 0
Conti..
Now combined P2 and P5
[{(P3, P6), P4},(P2,P5)]
Now update the matrix and merge P2,P5,P3,P6 and P4
([{(P3, P6), P4},(P2,P5)], P1)
Now we have reached to the solution.
7/19/2024 80
Dr. Shivashankar, ISE, GAT
P1 P2 P3,P6,P4 P5
P1 0
P2 0.23 0
P3,P6,P4 0.22 0.14 0
P5 0.34 0.14 0.28 0
P1 P2,P5 P3,P6,P4
P1 0
P2,P5 0.23 0
P3,P6,P4 0.22 0.14 0
P1 P2,P5 P3,P6,P4
P1 0
P2,P5 0.23 0
P3,P6,P4 0.22 0.14 0
P1 P2,P5,P3,P6,P
4
P1 0
P2,P5,P3,P6,P4 0.22 0
Conti
The dendrogram as per the solution is as follow
P3
P6
P4
P2
P5
P1
Dendrogram of the cluster formed for the group P1,P2,P3,P4,P5 and P6.
7/19/2024 81
Dr. Shivashankar, ISE, GAT
Conti..
Problem 3: Given a one dimensional dataset {1,5,8,10,2}, use the Agglomerative clustering
algorithm with complete link with Euclidean distance to establish a hierarchical grouping
relationship. By using the cutting threshold of 5, how many clusters are there? What is
there membership in each group?
Solution:
Euclidean distance = 𝑥2 − 𝑥1
2 + 𝑦2 − 𝑦1
2
for 1 dimensional Euc-dist= 𝑥2 − 𝑥1
2
Apply 1D Euclidean distance to calculate the matrix
7/19/2024 82
Dr. Shivashankar, ISE, GAT
1 5 8 10 2
1 0 4 7 9 1
5 4 0 3 5 3
8 7 3 0 2 6
10 9 5 2 0 8
2 1 3 6 8 0
1 2 3 4 5
1 0 4 7 9 1
2 4 0 3 5 3
3 7 3 0 2 6
4 9 5 2 0 8
5 1 3 6 8 0
Conti..
From the distance matrix, we can find distance between points 1 and 5 is smallest, i,e.2.
Then merge {1,5}.
Now recalculate the distance:
d(2,{1,5}}=max{d(2,1), d(2,5)}=max(4,3)=4
d(3,{1,5}}=max{d(3,1), d(3,5)}=max(7,6)=7
d(4,{1,5}}=max{d(4,1), d(4,5)}=max(4,5)=9
From the matrix, the distance between points 3 and 4 is smallest , i.e.2
Hence they merge together as to form a cluster {3,4}.
Using the complete link, we have the distance between different points/cluster as follows.
d({1,5}, {3,4})=max{d({1,5},3), d ({1,5},4)}=max(7,9)=9
d(2, {3,4})=max{d(2,3), d (2,4)}=max(3,5)=5
Thus, we can update the distance matrix, where row 2
corresponds to point 2, row 1 and 3 corresponds to
Cluster {1,5} and {3,4} as follows.
7/19/2024 83
Dr. Shivashankar, ISE, GAT
1,5 2 3 4
1,5 0 4 7 9
2 4 0 3 5
3 7 3 0 2
4 9 5 2 0
1,5 2 3,4
1,5 0 4 9
2 4 0 5
3,4 9 5 0
Conti..
Following the same procedure, we merge pints 2 with the cluster {1,5} to form {1,2,5} and
update the distance matrix as follows.
After increase the distance threshold to 9,
all clusters would merge.
Fig 12: Dendogram for the given datasets
7/19/2024 84
Dr. Shivashankar, ISE, GAT
[1,5],2 [3,4]
[1,5],2 0 9
[3,4] 9 0
Conti..
Problem 3: Given the data set {a,b,c,d,e} and following distance matrix. Construct a
dendrogram by average linkage hierarchical clustering using the Agglomerative method.
Solution:
The average linkage clustering uses the average formula, i.e. distance between two
clustering A & B
d(A,B)=avg{d(a,y): x𝜖𝐴, 𝑦𝜖𝐵}
d(A,B)=
∈𝑑 𝑥,𝑦 :x𝜖𝐴,𝑦𝜖𝐵
𝐴 𝐵
7/19/2024 85
Dr. Shivashankar, ISE, GAT
a b c d e
a 0 9 3 6 11
b 9 0 7 5 10
c 3 7 0 9 2
d 6 5 9 0 8
e 11 10 2 8 0
Conti..
Dataset : {a,b,c,d,e}
Initial clustering (Single to a sets)
C1={a},{b},{c},{d},{e}
From the table, the minimum distance is the distance between the clusters {c} and {e}.
Also, d({c}:{e})=2
We merge {c} ad {e} to form the cluster {c,e}
The new set of cluster C2 ={a},{b},{d},{c,e}
7/19/2024 86
Dr. Shivashankar, ISE, GAT
a b c d e
a 0 9 3 6 11
b 9 0 7 5 10
c 3 7 0 9 2
d 6 5 9 0 8
e 11 10 2 8 0
a b c,e d
a 0 9 ? 6
b 9 0 ? 5
c,e ? ? 0 ?
d 6 5 ? 0
Conti..
Let us compute the distance of{c,e} from other clusters.
d({c,e},{a})=avg{d(c,a),d(e,a)}=
3+11
2∗1
=7
d({c,e},{b})=avg{d(c,b),d(e,b)}=
7+10
2∗1
=8.5
d({c,e},{d})=avg{d(c,d),d(e,d)}=
9+8
2∗1
=7
Now update the table.
From C2 table, the minimum distance is the distance between the cluster {d} and {b}.
Also, d({b},{d})=5
We merge {b} and {d} to form the cluster {b,d}
The new set of cluster, C3: {a},{c,e},{b,d}
7/19/2024 87
Dr. Shivashankar, ISE, GAT
a b c,e d
a 0 9 7 6
b 9 0 8.5 5
c,e 7 8.5 0 8.5
d 6 5 8.5 0
Conti..
Let us compute the distance of {b,d} from other clusters.
d({b,d},{a})=avg{d(b,a),d(d,a)}
d({b,d},{a}) =
9+6
2∗1
= 7.5
d({b,d},{c,e}) =Avg{d(b,c): d(b,e),d(d,c),d(d,e)}
d({b,d},{c,e})=
7+10+9+8
2∗2
= 8.5
7/19/2024 88
Dr. Shivashankar, ISE, GAT
a b c,e d
a 0 9 7 6
b 9 0 8.5 5
c,e 7 8.5 0 8.5
d 6 5 8.5 0
a b,d c,e
A 0 ? 7
b,d ? 0 ?
c,e 7 ? 0
a b,d c,e
a 0 7.5 7
b,d 7.5 0 8.5
c,e 7 8.5 0
Conti..
From the table, the minimum distance is the distance between the clusters {a} and {c,e} is 7.
Also, d({a});{c,e})=7
We merge {a} and {b,d} to form the cluster {a,b,d}
The new set of clusters C4: {a,c,e},{b,d}
Let us compute the distance of {a,c,e}from other cluster.
D({a,c,e}, {b,d})=Avg{d(a,b),d(a,d),d(c,b),d(c,d),d(e,b),d(e,d)
D({a,c,e};{bd})=
9+6+7+9+10+8
3∗2
= 8.16
Fig 11: Dendogram for the dataset {a,b,c,d,e}.
7/19/2024 89
Dr. Shivashankar, ISE, GAT
a,c,e b,d
a,c,e 0 ?
b,d ? 0
a,c,e b,d
a,c,e 0 8.16
b,d 8.16 0
Divisive Clustering
• Divisive clustering is also a type of hierarchical clustering that is used to create
clusters of data points.
• It is an unsupervised learning algorithm that begins by placing all the data
points in a single cluster and then progressively splits the clusters until each
data point is in its own cluster.
• It is useful for analyzing datasets that may have complex structures or
patterns, as it can help identify clusters that may not be obvious at first
glance.
• Divisive clustering works by first assigning all the data points to one cluster.
• Then, it looks for ways to split this cluster into two or more smaller clusters.
• This process continues until each data point is in its own cluster.
7/19/2024 90
Dr. Shivashankar, ISE, GAT
Cont…
Steps to Divisive Hierarchical Clustering
The algorithm for divisive hierarchical clustering involves several steps.
Step 1: Consider all objects a part of one big cluster.
Step 2: Spilt the big cluster into small clusters using any flat-clustering method- ex. k-
means.
Step 3: Selects an object or subgroup to split into two smaller sub-clusters based on some
distance metric such as Euclidean distance or correlation coefficients.
Step 4: The process continues recursively until each object forms its own cluster.
7/19/2024 91
Dr. Shivashankar, ISE, GAT
Fig. 12: Concept of Divisive Hierarchical
Clustering
Cont…
7/19/2024 92
Dr. Shivashankar, ISE, GAT
Fig.13. Presents the differences between Agglomerative and Divisive
algorithms.
Conti..
1. k-NN algorithm does more computation on test time rather than train time.
A)TRUE
B) FALSE
2. Which of the following distance metric can not be used in k-NN?
A) Manhattan
B) Minkowski
C) Tanimoto
D) Jaccard
E) Mahalanobis
F) All can be used
3) Which of the following option is true about k-NN algorithm?
A) It can be used for classification
B) It can be used for regression
C) It can be used in both classification and regression
4) Which of the following machine learning algorithm can be used for imputing missing values of both categorical and continuous
variables?
A) K-NN
B) Linear Regression
C) Logistic Regression
5) Which of the following will be Euclidean Distance between the two data point A(1,3) and B(2,3)?
A) 1
B) 2
C) 4
D) 8
7/19/2024 93
Dr. Shivashankar, ISE, GAT
A. K-Means Clustering comes under
1.Supervised learning Algorithm
2. Unsupervised Learning Algorithm
3. Reinforcement Learning
4. None of the above
B. Which of the following is true for clustering
1. Clustering is a technique used to group similar objects into clusters.
2. partition data into groups
3. dividing entire data, based on patterns in data
4. All of the above
C. Which of the following is true for K-Means Clustering
1. All data points in a cluster should be similar to each
2. other.
3. The data points from different clusters should be as different as possible.
4. Both 1 and 2
5. Only 1
6. Only 2
D. Which of the following applications comes under clustering
1. Customer Segmentation
2. Targeted Marketing
3. Recommendation Engines
4. Predicting the temperature
5. Only 1,2,3,4
6. All the above
E. What is intra cluster distance
1. distance between points in the cluster to its centroid
2. distance between each point in the cluster
3. sum of squares of distances between points
4. None of the above
7/19/2024 94
Dr. Shivashankar, ISE, GAT
Conti..
Q1. Movie recommendation systems are an example of:
1. Classification
2. Clustering
3. Reinforcement Learning
4. Regression
Options:
A. 2 Only
B. 1 and 2
C. 1 and 3
D. 2 and 3
E. 1, 2, and 3
F. 1, 2, 3, and 4
Q2. Sentiment Analysis is an example of:
Regression
Classification
Clustering
Reinforcement Learning
Options:
A. 1 Only
B. 1 and 2
C. 1 and 3
D. 1, 2 and 3
E. 1, 2 and 4
F. 1, 2, 3 and 4
7/19/2024 95
Dr. Shivashankar, ISE, GAT
Conti..
Q3. Can decision trees be used for performing clustering?
A. True
B. False
Q4. What is the minimum no. of variables/ features required to perform clustering?
Options:
A. 0
B. 1
C. 2
D. 3
Q5. For two runs of K-Mean clustering, is it expected to get the same clustering results?
A. Yes
B. No
Q6. Which of the following clustering algorithms suffers from the problem of convergence at local optima?
A. K- Means clustering algorithm
B. Agglomerative clustering algorithm
C. Expectation-Maximization clustering algorithm
D. Diverse clustering algorithm
Options:
A. 1 only
B. 2 and 3
C. 2 and 4
D. 1 and 3
E. 1,2 and 4
F. All of the above
7/19/2024 96
Dr. Shivashankar, ISE, GAT

Machine Learning_SVM_KNN_K-MEANSModule 2.pdf

  • 1.
    MACHINE LEARNING (INTEGRATED) (21ISE62) Dr.Shivashankar Professor Department of Information Science & Engineering GLOBAL ACADEMY OF TECHNOLOGY-Bengaluru 7/19/2024 1 Dr. Shivashankar, ISE, GAT GLOBAL ACADEMY OF TECHNOLOGY Ideal Homes Township, Rajarajeshwari Nagar, Bengaluru – 560 098 Department of Information Science & Engineering
  • 2.
    Course Outcomes After Completionof the course, student will be able to:  Illustrate Regression Techniques and Decision Tree Learning Algorithm.  Apply SVM, ANN and KNN algorithm to solve appropriate problems.  Apply Bayesian Techniques and derive effective learning rules.  Illustrate performance of AI and ML algorithms using evaluation techniques.  Understand reinforcement learning and its application in real world problems. Text Book: 1. Tom M. Mitchell, Machine Learning, McGraw Hill Education, India Edition 2013. 2. EthemAlpaydın, Introduction to machine learning, MIT press, Second edition. 3. Pang-Ning Tan, Michael Steinbach, Vipin Kumar, Introduction to Data Mining, Pearson, First Impression, 2014. 7/19/2024 2 Dr. Shivashankar, ISE, GAT
  • 3.
    MODULE-2 SUPPORT VECTOR MACHINE •Support Vector Machine called as SVM is one of the most popular Supervised Learning algorithms, which is used for Classification as well as Regression prediction tool that uses machine learning theory to maximize predictive accuracy while automatically avoiding over-fit to the data. • SVM can be defined as systems which use hypothesis space of a linear functions in a high dimensional feature space, trained with a learning algorithm. • The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-dimensional space into classes so that we can easily put the new data point in the correct category in the future. • This best decision boundary is called a hyperplane. • SVM becomes famous when, using pixel maps as input; it gives best accuracy. • SVM was developed by Vladimir Vapnik in the 1970s. • SVM algorithm helps to find the best line or decision boundary; this best boundary or region is called as a hyperplane. • SVM algorithm finds the closest data points of the lines from both the classes. • These points are called support vectors. • The distance between the vectors and the hyperplane is called as margin and the goal of SVM is to maximize this margin. • The hyperplane with maximum margin is called the optimal hyperplane. 7/19/2024 3 Dr. Shivashankar, ISE, GAT
  • 4.
    Cont… SVM algorithm canbe used for Face detection, image classification, text categorization, etc. Types of SVM: Linear SVM: Used for linearly separable data, if a dataset can be classified into two classes by using a single straight line, then such data is termed as linearly separable data, and classifier is used called as Linear SVM classifier. Non-linear SVM: Used for non-linearly separated data, if a dataset cannot be classified by using a straight line, then such data is termed as non-linear data and classifier used is called as Non-linear SVM classifier. 7/19/2024 4 Dr. Shivashankar, ISE, GAT Fig. 2.1. Concept of SVM Technique
  • 5.
    Examples of BadDecision Boundaries Class 1 Class 2 Class 1 Class 2 Fig. 3: Examples of Bad Decision Boundaries
  • 6.
    Linearly Separable Case Ifa dataset can be classified into two classes by using a single straight line, then such data is termed as linearly separable data, and classifier is used called as Linear SVM classifier and classification problem is Binary classification or two class classification. Binary classification can be viewed as the task of separating classes in feature space: Hyperplane: Where 7/19/2024 6 Dr. Shivashankar, ISE, GAT f(x) = (wTx + b) – w : weight vector – x : input vector – b : bias or offset value Fig 2.2: Linearly Separable classification
  • 7.
    Cont.. Define the hyperplanesH such that w•xi+b ≥1, when yi =+1 w•xi+b < -1, when yi =–1 H1 and H2 are the margins: H1: w•xi+b = +1 H2: w•xi+b = –1 The points on the margins H1 and H2 are the tips of the Support Vectors. The plane H0 is the median in between, where w•xi+b =0 d+ = the shortest distance to the closest positive point. d- = the shortest distance to the closest negative point. The margin (gutter) of a separating hyperplane is d+ + d–. 7/19/2024 7 Dr. Shivashankar, ISE, GAT
  • 8.
    Maximizing the margin Wewant a classifier with as big margin as possible Recall the distance from a point (x0,y0) to a line: Ax+By+c = 0 is |A x0 +B y0 +c|/sqrt(A2+B2) The distance between H1 and H2 is: |w•x+b|/||w||=1/||w|| The distance between H1 and H2 is: 2/||w|| In order to maximize the margin, we need to minimize ||w||. With the condition that there are no datapoints between H1 and H2 : xi•w+b  +1 when yi =+1 xi•w+b  -1 when yi = -1 Can be combined into yi(xi•w)  1 7/19/2024 8 Dr. Shivashankar, ISE, GAT
  • 9.
    Constrained optimization problem •The problem of finding the optimal hyperplane is an optimization problem and can be solved by optimization techniques. • It can be solved by the Lagrangian Multipler method (αi), Which can be formulated as: 𝑤 = ෍ 𝑖=1 𝑚 𝛼𝑖𝑥𝑖𝑦𝑖 𝛼𝑖: the Lagrange multiplier, we need a Lagrange multiplier 𝛼i for each of the constraints 𝑥𝑖 𝑎𝑛𝑑 𝑦𝑖 are called as the support vectors. 7/19/2024 9 Dr. Shivashankar, ISE, GAT
  • 10.
    Cont… Problems: 1. Draw thehyperplane for the given data points (1,1) (2,1) (1,-1) (2,-1) (4,0) (5,1) (5,-1) (6,0) using SVM and classifying new data points (2,-2). Solution: 1. Plot the graph: 𝑆𝑒𝑙𝑒𝑐𝑡𝑠 𝑡ℎ𝑒 𝑣𝑒𝑐𝑡𝑜𝑟 𝑠𝑢𝑝𝑝𝑜𝑟𝑡𝑠: 𝑆1 = 2 1 𝑆2 = 2 −1 𝑆3 = 4 0 𝑆1 𝑆2 𝑆3--Support Vector because these are closest data points to the centroid (3-x-axis) 2. To provide vector representation, we need to add bias on all support vectors. Here we assume bias=1. So, our support vector now become: ҧ 𝑆1 2 1 1 ҧ 𝑆2 2 −1 1 ҧ 𝑆3 4 0 1 7/19/2024 10 Dr. Shivashankar, ISE, GAT
  • 11.
    Cont… 3. Consider onepart of the support vector as +ve and other as –ve. Here, 𝑆1 and 𝑆2 𝑎𝑟𝑒 − 𝑣𝑒 𝑎𝑛𝑑 𝑆3 𝑖𝑠 + 𝑣𝑒. 4. Our objective is to find an optimal hyperplane which means, we need to find the values of w and b of the optimal hyperplane. f 𝒙 = w.x +b=0 5. To find the optimal hyperplane, we use Lagrange (α) Multiplier method. Now let us complete w and b which determine the Optimal hyperplane. According to Lagrange equation, 𝑤 = ෍ 𝑖=1 𝑚 𝛼𝑖𝑥𝑖𝑦𝑖 Here, 𝑥𝑖 𝑎𝑛𝑑 𝑦𝑖 𝑎𝑟𝑒 𝑡ℎ𝑒 support vectors, 𝑆1 , 𝑆2 𝑎𝑛𝑑 𝑆3 Let us substitute support vectors in above equations ∝1 ഥ 𝑆1 ഥ 𝑆1+ ∝2 ഥ 𝑆1 𝑆2+∝3 ഥ 𝑆1 𝑆3 = −1 ∝1 𝑆2𝑆1+ ∝2 𝑆2 𝑆2+∝3 𝑆2 𝑆3 = −1 ∝1 𝑆3𝑆1+ ∝2 𝑆3 𝑆2+∝3 𝑆3 𝑆3 = 1 7/19/2024 11 Dr. Shivashankar, ISE, GAT
  • 12.
    Cont… Let us substitutevalues of 𝑆1 , 𝑆2 𝑎𝑛𝑑 𝑆3 ∝1 2 1 1 2 1 1 + ∝2 2 1 1 2 −1 1 + ∝3 2 1 1 4 0 1 = −1 ∝1 2 −1 1 2 1 1 + ∝2 2 −1 1 2 −1 1 + ∝3 2 −1 1 4 0 1 = −1 ∝1 4 0 1 2 1 1 + ∝2 4 0 1 2 −1 1 + ∝3 4 0 1 4 0 1 = 1 Therefore, 6 ∝1+4 ∝2+9 ∝3=-1 4 ∝1+6 ∝2+9 ∝3=-1 9 ∝1+9 ∝2+17 ∝3=1 After solving the above equations, we get ∝1=-3.25 ∝2= -3.25 ∝3= 3.5 7/19/2024 12 Dr. Shivashankar, ISE, GAT
  • 13.
    Cont… Now let usfind w, i.e. 𝒘 = ෍ ∝𝒊 ഥ 𝑺𝒊 𝑤 = −3.25 2 1 1 −3.25 2 −1 1 + 3.5 4 0 1 W= 1 0 −3 Therefore, hyperplane equation, f(x)=w.x+b So, w= 1 0 and offset or bias, b=-3 5. Plot hyperplane 7/19/2024 13 Dr. Shivashankar, ISE, GAT
  • 14.
    Cont… Since b=-3, ahyperplane is drawn +3 to the positive side and w is 1 0 , the hyperplane is drawn parallel to y – axis. Now let us clarify the new data points 2 −2 We know that w.x+b ≥ 𝟎 −− −𝒃𝒆𝒍𝒐𝒏𝒈𝒔 𝒕𝒐 𝒄𝒍𝒂𝒔𝒔 + 𝟏 w.x+b < 𝟎 −− −𝒃𝒆𝒍𝒐𝒏𝒈𝒔 𝒕𝒐 𝒄𝒍𝒂𝒔𝒔 − 𝟏 Let us substitute the values in the above equation Y= w.x+b Y= 1 0 2 −2 − 𝟑 Y=2-0-3 =-1 Therefore, new data point 2 −2 belongs to class -1 7/19/2024 14 Dr. Shivashankar, ISE, GAT
  • 15.
    Cont… Proble-2: Draw the hyperplanefor the given data points Positively labelled data points (3,1)(3,-1)(5,1)(5,-1) and Negatively labelled data points (1,0)(0,1)(0,-1)(-1,0) using SVM and classifying the solution. Solution: 𝑆𝑒𝑙𝑒𝑐𝑡𝑠 𝑡ℎ𝑒 𝑣𝑒𝑐𝑡𝑜𝑟 𝑠𝑢𝑝𝑝𝑜𝑟𝑡𝑠: 𝑆1 = 1 0 𝑆2 = 3 1 𝑆3 = 3 −1 Each vector is augmented with bias 1 So, 2. To provide vector representation, we need to add bias on all support vectors. Here we assume bias=1. So, our support vector now become: ҧ 𝑆1 1 0 1 ҧ 𝑆2 3 1 1 ҧ 𝑆3 3 −1 1 ∝1=-3.5 ∝2= 0.75 ∝3= 0.75 W= 1 0 −3 , So, w= 1 0 and offset or bias, b=-2 7/19/2024 15 Dr. Shivashankar, ISE, GAT
  • 16.
    Non-Linear SVM orNonlinear Separable Case • If data is linearly arranged, then we can separate it by using a straight line, but for non- linear data, we cannot draw a single straight line. • So to separate these data points, we need to add one more dimension. For linear data, we have used two dimensions x and y, so for non-linear data, we will add a third dimension z. It can be calculated as: z=x2 +y2 -------(1) • We must use a nonlinear SVM (i.e. we need to convert data from one feature space to another). For nonlinear separable case: • Φ1 𝑥1 𝑥2 = 4 − 𝑥2 + 𝑥1 − 𝑥2 4 − 𝑥1 + 𝑥1 − 𝑥2 𝑖𝑓 𝑥1 2 + 𝑥2 2 > 2 𝑥1 𝑥2 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 7/19/2024 16 Dr. Shivashankar, ISE, GAT Fig. 11: Nonlinear data points Fig. 12: Added 3rd axis Fig. 11: After added 3rd axis, best hyperplane for nonlinear SVM
  • 17.
    Conti… Problem 1: Drawthe hyperplane for the given data points Positively labelled data points (2,2)(2,-2)(-2,-2)(-2,2) and Negatively labelled data points (1,1)(1,-1)(-1,-1)(-1,1) using nonlinear SVM and classifying the solution. Solution: 1. Plot the graph 2. Nonlinear separable case: • From the plotted graph, there is no hyperplane exists in the input space. • We must use a nonlinear SVM (i.e. we need to convert data from one feature space to another). For nonlinear separable case: • Φ1 𝑥1 𝑥2 = 4 − 𝑥2 + 𝑥1 − 𝑥2 4 − 𝑥1 + 𝑥1 − 𝑥2 𝑖𝑓 𝑥1 2 + 𝑥2 2 > 2 𝑥1 𝑥2 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 7/19/2024 17 Dr. Shivashankar, ISE, GAT
  • 18.
    Conti… By applying nonlinearequation, convert the given data pints into other features. So, positive examples are 𝟐 𝟐 , 𝟐 −𝟐 , −𝟐 −𝟐 , −𝟐 𝟐 ----- 𝟐 𝟐 , 𝟏𝟎 𝟔 , 𝟔 𝟔 , 𝟔 𝟏𝟎 And negative examples are 𝟏 𝟏 , 𝟏 −𝟏 , −𝟏 −𝟏 , −𝟏 𝟏 ----- 𝟏 𝟏 , 𝟏 −𝟏 , −𝟏 −𝟏 , −𝟏 𝟏 3. Now plot the graph for obtained new data points Now we can classify easily identify the Support vectors 𝑺𝟏 = 𝟏 𝟏 , 𝑺𝟐 = 𝟐 𝟐 Each vector is augmented with 1 as bias input ҧ 𝑆1 1 1 1 𝑎𝑛𝑑 ҧ 𝑆2 2 2 1 7/19/2024 18 Dr. Shivashankar, ISE, GAT
  • 19.
    Conti.. According to Lagrangeequation, 𝑤 = ෍ 𝑖=1 𝑚 𝛼𝑖𝑥𝑖𝑦𝑖 Here, 𝑥𝑖 𝑎𝑛𝑑 𝑦𝑖 𝑎𝑟𝑒 𝑡ℎ𝑒 support vectors, 𝑆1 𝑎𝑛𝑑 𝑆2 Let us substitute support vectors in above equations ∝1 ഥ 𝑆1 ഥ 𝑆1+ ∝2 ഥ 𝑆1 𝑆2 = −1 ∝1 ഥ 𝑆1 𝑆2+ ∝2 𝑆2 𝑆2 = 1 After substitute 𝑆1 and 𝑆2 values and simplified the above equations, 3∝1 +5 ∝2= −1 5∝1 +9 ∝2= −1 Therefore ,∝1= −7 𝑎𝑛𝑑 ∝2= 4 𝒘 = ෍ ∝𝒊 ഥ 𝑺𝒊 𝑤 = −7 1 1 1 +4 2 2 1 = 1 1 −3 . Therefore, hyperplane y=wx+b, with w= 𝟏 𝟏 and bias =-3 7/19/2024 19 Dr. Shivashankar, ISE, GAT
  • 20.
    Support Vector MachineTerminology Hyperplane: The hyperplane tries that the margin between the closest points of different classes should be as maximum as possible. In the case of linear classifications, it will be a linear equation i.e. wx+b = 0. Support Vectors: The closest data points to the hyperplane, which makes a critical role in deciding the hyperplane and margin. Margin: is the distance between the support vector and hyperplane. The main objective of the SVM algorithm is to maximize the margin. The wider margin indicates better classification performance. Kernel: is the mathematical function, which is used in SVM to map the original input data points into high-dimensional feature spaces. Some of the common kernel functions are linear, polynomial and radial basis function(RBF). Hard Margin: Also called as the maximum-margin hyperplane is a hyperplane that properly separates the data points of different categories without any misclassifications. Soft Margin: When the data is not perfectly separable or contains outliers, SVM permits a soft margin technique. It discovers a compromise between increasing the margin and reducing violations. Hinge Loss: A typical loss function in SVMs is hinge loss. It punishes incorrect classifications or margin violations. 7/19/2024 20 Dr. Shivashankar, ISE, GAT
  • 21.
    How Does SupportVector Machine Algorithm Work? • The best way to understand the SVM algorithm is by the SVM classifier. • This hyper-pane is chosen based on margin as the hyperplane providing the maximum margin between the two classes is considered. • These margins are calculated using data points known as Support Vectors. Support Vectors are those data points that are near to the hyper-plane and help in positioning data points it. 7/19/2024 21 Dr. Shivashankar, ISE, GAT
  • 22.
    Cont… The functioning ofSVM classifier is to be understood mathematically then it can be understood in the following ways- Step 1: SVM algorithm predicts the classes. One of the classes is identified as 1 while the other is identified as -1. Step 2: As all machine learning algorithms convert the business problem into a mathematical equation involving unknowns. These unknowns are then found by converting the problem into an optimization problem. Step 3: This loss function can also be called a cost function whose cost is 0 when no class is incorrectly predicted. If this is not the case, then error/loss is calculated. Step 4: As is the case with most optimization problems, weights are optimized by calculating the gradients using advanced mathematical concepts of calculus viz. partial derivatives. Step 5: The gradients are updated only by using the regularization parameter when there is no error in the classification while the loss function is also used when misclassification happens. 7/19/2024 22 Dr. Shivashankar, ISE, GAT
  • 23.
    Important Concepts inSVM • Support vectors are those data points whose basis the margins are calculated and maximized. • The number of support vectors or the strength of their influence is one of the hyper-parameters. 7/19/2024 23 Dr. Shivashankar, ISE, GAT Fig. 2: Presents Support vectors, margin and Classes
  • 24.
    Cont… Hard Margin: • HardMargin refers to that kind of decision boundary that makes sure that all the data points are classified correctly. • While this leads to the SVM classifier not causing any error, it can also cause the margins to shrink thus making the whole purpose of running an SVM algorithm without results. Soft Margin: • Soft Margin SVM introduces flexibility by allowing some margin violations (misclassifications) to handle cases where the data is not perfectly separable. 7/19/2024 24 Dr. Shivashankar, ISE, GAT
  • 25.
    SVM Implementation inPython In Python, an SVM classifier can be developed using the sklearn library. Step 1: Load the important libraries >> import pandas as pd >> import numpy as np >> import sklearn >> from sklearn import svm >> from sklearn.model_selection import train_test_split >> from sklearn import metrics Step 2: Import dataset and extract the X variables and Y separately. >> df = pd.read_csv(“mydataset.csv”) >> X = df.loc[:,[‘Var_X1’,’Var_X2’,’Var_X3’,’Var_X4’]] >> Y = df[[‘Var_Y’]] Step 3: Divide the dataset into train and test >> X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.3, random_state=123) Step 4: Initializing the SVM classifier mode >> svm_clf = svm.SVC(kernel = ‘linear’) 7/19/2024 25 Dr. Shivashankar, ISE, GAT
  • 26.
    Cont… Step 5: Fittingthe SVM classifier model >> svm_clf.fit(X_train, y_train) Step 6: Coming up with predictions >> y_pred_test = svm_clf.predict(X_test) Step 7: Evaluating model’s performance >> metrics.accuracy(y_test, y_pred_test) >> metrics.precision(y_test, y_pred_test) >> metrics.recall(y_test, y_pred_test) 7/19/2024 26 Dr. Shivashankar, ISE, GAT
  • 27.
    Advantages & Disadvantagesof SVM Advantages • It is one of the most accurate machine learning algorithms. • It is a dynamic algorithm and can solve a range of problems, including linear and non-linear problems, binary, binomial, and multi-class classification problems, along with regression problems. • SVM uses the concept of margins and tries to maximize the differentiation between two classes; it reduces the chances of model overfitting, making the model highly stable. • SVM is known for its computation speed and memory management. It uses less memory, especially when compared to machine vs deep learning algorithms with whom SVM often competes. Disadvantages: • While SVM is fast and can work in high dimensions, it still fails in front of Naïve Bayes, providing faster predictions in high dimensions. Also, it takes a relatively long time during the training phase. • Compared to other linear algorithms such as Linear Regression, SVM is not highly interpretable, especially when using kernels that make SVM non-linear. Thus, it isn’t easy to assess how the independent variables affect the target variable. 7/19/2024 27 Dr. Shivashankar, ISE, GAT
  • 28.
    Cont… Applications of SVM: •Text categorization • Semantic role labeling (predicate, agent, ..) • Image classification • Image segmentation • Hand-written recognition Characteristics of SVM • Based on supervised learning methods • Using for classification or regression analysis • A non-probabilistic binary linear classifier • Representation of the examples as points in space • Examples of the separate categories are divided by a clear gap that is as wide as possible. • New examples are then mapped into that same space and predicted to belong to a category based on the side of the gap on which they fall • Performing linear classification. 7/19/2024 28 Dr. Shivashankar, ISE, GAT
  • 29.
    K-Nearest Neighbour • Thek-Nearest Neighbors (KNN) algorithm is a non-parametric, supervised learning classifier, which uses proximity to make classifications or predictions about the grouping of an individual data point. • It is one of the popular and simplest classification and regression classifiers used in machine learning today. • The nearest neighbors of an instance are defined in terms of the standard Euclidean Distance. More precisely, let an arbitrary instance x be described by the feature vector (𝑎1 𝑥 , 𝑎2 𝑥 , … … . , 𝑎𝑛(𝑥)) Distance between two instances 𝑥𝑖 and 𝑥𝑗 is defined to be d(𝒙𝒊, 𝒙𝒋 ), where, 𝑑 𝑥𝑖, 𝑥𝑗 ≡ ෍ 𝑟=1 𝑛 𝑎𝑟 𝑥𝑖 − 𝑎𝑟 𝑥𝑗 2 The K-NN Real valued target function can be defined as: f(x)= σ𝒊=𝟏 𝒌 𝒘𝒊𝒇(𝒙𝒊) σ𝒊=𝟏 𝒌 𝒘𝒊 Where, 𝒘𝒊 = 𝟏 𝒅 𝒙𝒒,𝒙𝒊 𝟐 7/19/2024 29 Dr. Shivashankar, ISE, GAT Fig 2.1: K-NN example
  • 30.
    Cont… 7/19/2024 30 Dr. Shivashankar,ISE, GAT 𝑬𝒄𝒍𝒊𝒅𝒆𝒂𝒏 𝑫𝒊𝒔𝒕𝒂𝒏𝒄𝒆 𝒃𝒆𝒕𝒘𝒆𝒆𝒏 𝑨 𝒂𝒏 𝑩 = 𝑿𝟐 − 𝑿𝟏 𝟐 + 𝒀𝟐 − 𝒀𝟏 𝟐
  • 31.
    K-Nearest Neighbor (KNN)Algorithm for Machine Learning • K-NN is one of the simplest Machine Learning algorithms based on Supervised Learning technique. • K-NN algorithm assumes the similarity between the new case/data and available cases and put the new case into the category that is most similar to the available categories. • K-NN algorithm stores all the available data and classifies a new data point based on the similarity. This means when new data appears then it can be easily classified into a well suite category by using K- NN algorithm. • K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for the Classification problems. • K-NN is a non-parametric algorithm, which means it does not make any assumption on underlying data. • It is also called a lazy learner algorithm because it does not learn from the training set immediately instead it stores the dataset and at the time of classification, it performs an action on the dataset. • KNN algorithm at the training phase just stores the dataset and when it gets new data, then it classifies that data into a category that is much similar to the new data. 7/19/2024 31 Dr. Shivashankar, ISE, GAT
  • 32.
    How does K-NNwork? The K-NN working can be explained on the basis of the below algorithm: Step-1: Select the number K of the neighbors Step-2: Calculate the Euclidean distance of K number of neighbors Step-3: Take the K nearest neighbors as per the calculated Euclidean distance. Step-4: Among these k neighbors, count the number of the data points in each category. Step-5: Assign the new data points to that category, number of the neighbor is maximum. Step-6: Our model is ready. Suppose we have a new data point and we need to put it in the required category. Consider the below image: 7/19/2024 32 Dr. Shivashankar, ISE, GAT Fig. 2.13: K-NN for best classifier
  • 33.
    Cont… • Firstly, wewill choose the number of neighbors, so we will choose the k value. • Next, we will calculate the Euclidean distance between the data points. The Euclidean distance is the distance between two points, which we have already studied in geometry. It can be calculated as: • Euc-dist[(𝑥1, 𝑦1); (𝑥2, 𝑦2)= 𝑥2 − 𝑥1 2 + 𝑦2 − 𝑦1 2 • By calculating the Euclidean distance we got the nearest neighbors, as three nearest neighbors in category A and two nearest neighbors in category B. Consider the below graph: • As we can see the 3 nearest neighbors are from category A, hence this new data point must belong to category A. 7/19/2024 33 Dr. Shivashankar, ISE, GAT
  • 34.
    Why do weneed a K-NN Algorithm? • Suppose there are two categories, i.e., Category A and Category B, and we have a new data point x1, so this data point will lie in which of these categories. • To solve this type of problem, we need a K-NN algorithm. With the help of K- NN, we can easily identify the category or class of a particular dataset. 7/19/2024 34 Dr. Shivashankar, ISE, GAT Fig. 11. Presents the importance of KNN
  • 35.
    Advantages and Disadvantagesof KNN Algorithm Advantages: • It is simple to implement. • It is robust to the noisy training data • It can be more effective if the training data is large. Disadvantages: • Always needs to determine the value of K which may be complex some time. • The computation cost is high because of calculating the distance between the data points for all the training samples. 7/19/2024 35 Dr. Shivashankar, ISE, GAT
  • 36.
    Applications of K-nearestNeighbor 1. Credit score The KNN algorithm compares an individual's credit rating to others with comparable characteristics to help calculate their credit rating. 2. Approval of the loan The k-nearest neighbor technique, similar to credit scoring, is useful in detecting people who are more likely to default on loans by comparing their attributes to those of similar people. 3. Preprocessing of data Many missing values can be found in datasets. Missing data imputation is a procedure that uses the KNN algorithm to estimate missing values. 4. Healthcare: KNN has also had application within the healthcare industry, making predictions on the risk of heart attacks and prostate cancer. The algorithm works by calculating the most likely gene expressions.. 5. Prediction of stock prices The KNN algorithm is useful in estimating the future value of stocks based on previous data since it has a knack for anticipating the prices of unknown entities. 6. Recommendation systems KNN can be used in recommendation systems since it can help locate people with comparable traits. It can be used in an online video streaming platform, for example, to propose content that a user is more likely to view based on what other users watch. 7. Computer Vision For picture classification, the KNN algorithm is used. It's important in a variety of computer vision applications since it can group comparable data points together, such as cats and dogs in separate classes. 8. Easy to implement: Given the algorithm’s simplicity and accuracy, it is one of the first classifiers that a new data scientist will learn. 7/19/2024 36 Dr. Shivashankar, ISE, GAT
  • 37.
    Conti.. Problem 1: Fromthe given dataset, find (x,y)= (170, 57) whether belongs to under or normal weight. Assume K=3. Solution: Find the Euc-dist:d= 𝑥2 − 𝑥1 2 + 𝑦2 − 𝑦1 2 d1= 170 − 167 2 + 57 − 51 2 = 32 + 62 = 45 = 6.70 d2= 122 + 52 = 169 =13 And so on 7/19/2024 37 Dr. Shivashankar, ISE, GAT Height (cm) Weight (kg) Class 167 51 Underweight 182 62 Normal 176 69 Normal 173 64 Normal 172 65 Normal 174 56 Underweight 169 58 Normal 173 57 Normal 170 55 Normal 170 57 ?
  • 38.
    Conti.. Since K=3, withmaximum 3 ranks with distances. The smallest distance is • (169,58)-1.414: Normal • (170,55)-2: Normal • (173,57)-3:Normal Hence all 3 points, so (170,57)belongs to normal class, 7/19/2024 38 Dr. Shivashankar, ISE, GAT Height (cm) Weight (kg) Class Distance 167 51 Underweight 6.7 182 62 Normal 13 176 69 Normal 13.4 173 64 Normal 7.6 172 65 Normal 8.2 174 56 Underweight 4.1 169 58 Normal 1.414-1(R) 173 57 Normal 3-3(R) 170 55 Normal 2-2(R) 170 57 Normal 3
  • 39.
    Conti.. Problem 2: Fromthe given dataset, find (x,y)= (157, 54) whether belongs to medium or longer. Assume K=3. Solution: Find the Euc-dist:d= 𝑥2 − 𝑥1 2 + 𝑦2 − 𝑦1 2 7/19/2024 39 Dr. Shivashankar, ISE, GAT Sl. No. Height Weight Target 1 150 50 Medium 2 155 55 Medium 3 160 60 Longer 4 161 59 Longer 5 158 65 Longer 6 157 54 ? Sl. No. Height Weight Target Distance 1 150 50 Medium 8.06 2 155 55 Medium 2.24 (1) 3 160 60 Longer 6.71(3) 4 161 59 Longer 6.40(2) 5 158 65 Longer 11.05 6 157 54 ?
  • 40.
    Conti.. From the tableand K=3, with maximum 3 ranks with distances We have 2.24 (medium), 6.40(Longer) and 6.71(Longer) f(𝑥𝑣) = f(𝑥𝑣) = angmax 𝑣𝜖𝑉 ෍ 𝑖=1 𝑘 𝛿 𝑣, 𝑓(𝑥𝑣 ) −− −𝛿 𝑎, 𝑏 = 1 𝑖𝑓 𝑎 == 𝑏 𝛿 𝑎, 𝑏 = 0 𝑖𝑓 𝑎 ≠ 𝑏 Compare medium with 2.24(m), 6.40(L) and 6.71(L) ==𝛿 𝑀, 𝑀 + 𝛿 𝑀, 𝐿 + 𝛿 𝑀, 𝐿 1+0+0=1 Compare longer with 2.24(m), 6.40(L) and 6.71(L) ==𝛿 𝐿, 𝑀 + 𝛿 𝐿, 𝐿 + 𝛿 𝐿, 𝐿 0+1+1=2 Since 2 is longer, (157,54)belong to longer If we consider the distance 2.24, 6.71 and 6.40, -----2.24 is smaller, hence medium could be consider. Distance weighted NN: 1. Discrete valued target function 2. Real valued target function 7/19/2024 40 Dr. Shivashankar, ISE, GAT
  • 41.
    Conti.. Discrete valued function: f(𝑥𝑣)= angmax 𝑣𝜖𝑉 ෍ 𝑖=1 𝑘 𝑤𝑖𝛿 𝑣, 𝑓(𝑥𝑖 ) Where, 𝑤𝑖 = 1 𝑑 𝑥𝑞,𝑥𝑖 2 W.r.t. medium: f(𝑥𝑞) =0.199*𝛿 𝑚, 𝑚 + 0.022∗𝛿 𝑚, 𝑙 + 0.024*𝛿 𝑚, 𝑙 =0.199*1 + 0.022∗0+0.024∗0= 0.199 W.r.t. Longer: f(𝑥𝑞) =0.199*𝛿 𝑙, 𝑚 + 0.022∗𝛿 𝑙, 𝑙 + 0.024*𝛿 𝑙, 𝑙 = 0.199*0 + 0.022∗1+0.024∗1= 0.046 Since 0.199 > 0.046—new instance is Classified to medium. 7/19/2024 41 Dr. Shivashankar, ISE, GAT Sl. No. Height Weight Target Distance 1 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 2 1 150 50 Mediu m 8.06 2 155 55 Mediu m 2.24 (1) 0.199 3 160 60 Longer 6.71(3) 0.022 4 161 59 Longer 6.40(2) 0.024 5 158 65 Longer 11.05 6 157 54 Mediu m
  • 42.
    Conti.. Real valued targetfunction: f(x)= σ𝑖=1 𝑘 𝑤𝑖𝑓(𝑥𝑖) σ𝑖=1 𝑘 𝑤𝑖 Where, 𝑤𝑖 = 1 𝑑 𝑥𝑞,𝑥𝑖 2 weighted vectors-randomly we will consider f(𝑥𝑞) = (0.199∗1.2+0.022∗1.8+0.024∗2.1) 0.45+0.15+0.16 =1.51 7/19/2024 42 Dr. Shivashankar, ISE, GAT Sl. No. Height Weight Target Distance 1 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 2 1 150 50 1.5 8.06 2 155 55 1.2 2.24 (1) 0.199 3 160 60 1.8 6.71(3) 0.022 4 161 59 2.1 6.40(2) 0.024 5 158 65 1.7 11.05 6 157 54 1.5
  • 43.
    Conti.. Problem 3: Calculatethe centroid classifier for the give data and the given a test instance (6,5), predict the class. Solution: • Step1: Compute the mean/centroid of each class. • There are 2 classes, A & B. • Centroid of class A=(3+5+4,1+2+3)/3=(12,6)/3=(4,2) • Centroid of class B=(7+6+8,6+7+5)/3=(21,18)/3=(7,6) • Step 2: calculate the Euclidean distance between test instance (6,5) and each of the centroid. 7/19/2024 43 Dr. Shivashankar, ISE, GAT X Y Class 3 1 A 5 2 A 4 3 A 7 6 B 6 7 B 8 5 B
  • 44.
    Conti.. Euc-dist[(𝑥1, 𝑦1); (𝑥2,𝑦2)= 𝑥2 − 𝑥1 2 + 𝑦2 − 𝑦1 2 Class A : [(6,5);(4,2)] = 4 − 6 2 + 2 − 5 2 = 3.6 Class B: [(6,5);(7,6)] = 7 − 6 2 + 6 − 5 2 =1.414 The test instance has smaller distance to class B Hence, the class of this test instance is predicted as B. Problem 4. Given the following training instances in the table, each having two attributes (x1 and x2). Compute the class label for test instance 𝑡1 = 3,7 , using 3 nearest neighbors (k=3). 7/19/2024 44 Dr. Shivashankar, ISE, GAT Training Instances 𝑥1 𝑥2 Output 𝐼1 7 7 0 𝐼2 7 4 0 𝐼3 3 4 1 𝐼4 1 4 1
  • 45.
    Conti.. Euc-dist[(𝑥1, 𝑦1); (𝑥2,𝑦2)=d= 𝑥1 − 𝑦1 2 + 𝑥2 − 𝑦2 2 d= 𝑥1 − 𝑦1 2 + 𝑥2 − 𝑦2 2 Neighbor rank 𝑑1 = 7 − 3 2 + 7 − 7 2 =4 3 𝑑2 = 7 − 3 2 + 4 − 7 2 = 5 4 𝑑3 = 3 − 3 2 + 4 − 7 2 = 3 1 𝑑4 = 1 − 3 2 + 4 − 7 2 = 3.6 2 For K=3, we will consider 𝐼1 = 3, 𝐼3 = 1,and 𝐼4 = 2 So K=3, 𝑡2=(3,7) -----output is 1 Highest vote=0.11, so output =1 7/19/2024 45 Dr. Shivashankar, ISE, GAT d 𝑑2 Vote =1/𝑑2 Rank 4 16 1/16=0.06 3 5 25 1/25=0.04 4 3 9 1/9=0.11 1 3.6 12.96 1/12.96=0 .08 2
  • 46.
    Conti.. Problem 5: ApplyKNN classifier to predict the diabetic patience with the given features BMI, Age. If the training examples are: Assume K=3, Test example: BMI=43.6, Age=40, Sugar=? 7/19/2024 46 Dr. Shivashankar, ISE, GAT BMI Age Sugar 33.6 50 1 26.6 30 0 23.4 40 0 43.1 67 0 35.3 23 1 35.9 67 1 36.7 45 1 25.7 46 0 23.3 29 0 31 56 1
  • 47.
    Conti.. Solution: First calculate thedistance between the test instances and training instance: Test examples: BMI=43.6. Age=40, sugar=? Euc-dist=d= 𝑥2 − 𝑥1 2 + 𝑦2 − 𝑦1 2 , 𝒅𝟏 = 43.6 − 33.6 2 + 40 − 50 2 = 14.14 Therefore, for test examples: BMI=43.6, Age=40, sugar=1, because in the rank 1, sugar=1 7/19/2024 47 Dr. Shivashankar, ISE, GAT BMI Age Sugar Distance to new Rank 33.6 50 1 14.14 2 26.6 30 0 19.72 5 23.4 40 0 20.20 6 43.1 67 0 27.00 9 35.3 23 1 18.92 4 35.9 67 1 28.08 10 36.7 45 1* 8.52 1 25.7 46 0 18.88 3 23.3 29 0 23.09 8 31 56 1 20.37 7
  • 48.
    Cont… Problem 6: giventhe training data, predict the class of the following new examples using KNN for K=5, age<=30, income = medium, student=yes, credit rating=fair. 7/19/2024 48 Dr. Shivashankar, ISE, GAT Age Income Student Credit rating Buys computers <=30 High No Fair No <=30 High No Excellent No 30..40 High No Fair Yes >40 Medium No Fair Yes >40 Low Yes Fair Yes >40 Low Yes Excellent No 31..40 Low Yes Excellent Yes <=30 Medium No Fair no <=30 Low Yes Fair Yes >40 Medium Yes Fair Yes <=30 Medium Yes Excellent Yes 31..40 Medium No Excellent Yes 31..40 High Yes Fair Yes >40 Medium no Excellent No
  • 49.
    Cont… Solution: • For similaritymeasures, use a single match of attribute values: • σ𝑖=1 4 𝑤𝑖 ∗ 𝜕 𝑎𝑖,𝑏𝑖 4 • Where, 𝜕 𝑎𝑖, 𝑏𝑖 =1 if 𝑎𝑖 = 𝑏𝑖 and • =0 otherwise. • 𝑎𝑖𝑎𝑛𝑑 𝑏𝑖 are either age, income, stude or credit rating • Weight are all 1 except for income it is 2. • Now, new examples using KNN for K=5, age<=30, income = medium, student=yes, credit rating=fair. • For RID=1 class=no, distance to new: (1*1+2*0+1*0+1*1)/4=0.5 7/19/2024 49 Dr. Shivashankar, ISE, GAT Age<=30 from the table Age<=30 from the given new examples Income-high from the table Income-medium Student-no from the table Student-yes Credit rating-fair from the table Credit rating-fair from new example
  • 50.
    Cont… 7/19/2024 50 Dr. Shivashankar,ISE, GAT Age Income Student Credit rating Buys computers RID class distance <=30 High No Fair No 1 No 0.5 <=30 High No Excellent No 2 No 0.25 30..40 High No Fair Yes 3 Yes 0.25 >40 Medium No Fair Yes* 4 Yes 0.75 >40 Low Yes Fair Yes 5 Yes 0.5 >40 Low Yes Excellent No 6 No 0.25 31..40 Low Yes Excellent Yes 7 Yes 0.25 <=30 Medium No Fair No 8 No 1 <=30 Low Yes Fair Yes* 9 Yes 0.75 >40 Medium Yes Fair Yes* 10 Yes 1 <=30 Medium Yes Excellent Yes* 11 Yes 1 31..40 Medium No Excellent Yes 12 Yes 0.5 31..40 High Yes Fair Yes 13 Yes 0.5 >40 Medium no Excellent No 14 No 0.5
  • 51.
    Cont… • Therefore, amongthe five nearest neighbors (RID and distance values: 4-0.75,8-1,9—0.75,10-1,11-1), four are from class Yes and one from class No. • Hence, the KNN-classifier, buy computers=yes. 7/19/2024 51 Dr. Shivashankar, ISE, GAT
  • 52.
    Clustering K-means • Thetask of grouping data points based on their similarity with each other is called Clustering or Cluster Analysis. • This method is defined under the branch of Unsupervised Learning, which aims at gaining insights from unlabelled data point • Cluster analysis divides the data into groups (clusters) that are meaningful, useful, or both. • For instance, clustering can be regarded as a form of classification in that it creates a labeling of objects with class (cluster) labels. 7/19/2024 52 Dr. Shivashankar, ISE, GAT
  • 53.
    K-means • K-Means Clusteringis an unsupervised learning algorithm that is used to solve the clustering problems in machine learning or data science. • K means clustering, assigns data points to one of the K clusters depending on their distance from the center of the clusters. • It starts by randomly assigning the clusters centroid in the space. • Then each data point assign to one of the cluster based on its distance from centroid of the cluster. • After assigning each point to one of the cluster, new cluster centroids are assigned. • This process runs iteratively until it finds good cluster. • Here, K defines the number of pre-defined clusters that need to be created in the process, as if K=2, there will be two clusters, and for K=3, there will be three clusters, and so on. • Hence, each cluster has data points with some commonalities/similarities, and it is away from other clusters. 7/19/2024 53 Dr. Shivashankar, ISE, GAT
  • 54.
    The Basic K-meansAlgorithm • First, we randomly initialize k points, called means or cluster centroids. • We categorize each item to its closest mean, and we update the mean’s coordinates, which are the averages of the items categorized in that cluster so far. • We repeat the process for a given number of iterations and at the end, we have our clusters. Basic K-means algorithm Step-1: Select the number K (clusters) randomly to decide the number of clusters. Step-2: Select random K points or centroids. (It can be other from the input dataset). Step-3: Assign each data point to their closest centroid, which will form the predefined K clusters. Step-4: Calculate the variance and place a new centroid of each cluster. Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid of each cluster. Step-6: If any reassignment occurs, then go to step-4 else go to FINISH. Step-7: The model is ready. 7/19/2024 54 Dr. Shivashankar, ISE, GAT
  • 55.
    Strengths and Weaknesses Strength •K-means is simple and can be used for a wide variety of data types. • It is also quite efficient, even though multiple runs are often performed. • This algorithm is very easy to understand and implement. • This algorithm is efficient, Robust, and Flexible • If data sets are distinct and spherical clusters, then give the best result Weaknesses • This algorithm needs prior specification for the number of cluster centers that is the value of K. • It cannot handle outliers and noisy data, as the centroids get deflected • It does not work well with a very large set of datasets as it takes huge computational time. 7/19/2024 55 Dr. Shivashankar, ISE, GAT
  • 56.
    Cont… Problem 1: Dividethe given sample data into two clusters [2] using K means algorithm S={2,3,4,10,11,12,20,25,30}. Given K=2, for new data point 15, identify the cluster belongs to. Solution: 1. Choose 2 random clusters from the given data sets C1=4, C2=12. 2. Find the distance between given samples and centroids, put the sample in the nearest cluster. 3. Repeat the same for all data points. Cluster k1={2,3,4} -------------(2-4=2, 3-4=1, 4-4=0, 10-4=6,…….. 2-12=10, 3-12=9, 5-12=7, 10-12=2,…….. so 2,3 and 4 are placed in cluster 1 as its distance is nearest to C1=4 and Cluster K2={10,11,12,20,25,30} 4. Compute new centroids K1={2,3,4} K2={10,11,12,20,25,30} C1={2+3+4/3}=3 C2={10+11+12+20+25+30}/6=18 So C1=3 C2=18 7/19/2024 56 Dr. Shivashankar, ISE, GAT
  • 57.
    Cont… 5. Find newclustering C1=3 and C2=18 K1={2,3,4,10} K2={11,12,20,25,30} C1=2+3+4+10/4 =4.75 K2=11+12+20+25+30/5=19.6 6. Find new clustering C1=4.75 and C2=19.6 K1={2,3,4,10,11,12} K2-{20,25,30} C1=2+3+4+10+11+12/6=7 C2=20+25+30/3=25 7. Find new clustering C1=7 and C2=25 K1={2,3,4,10,11,12} K2-{20,25,30} Since clustering and centroid values remains same. So the given dataset is dividing into 2 clusters as K1={2,3,4,10,11,12} K2-{20,25,30} With centroids C1=7 and C2=25. 8. Identify the cluster for new data points 15 Distance between 15 and C1(15-7)=8 Distance between 15 and C2(15-25)=10 Since distance between 15 and C1 is less, new data point 15 belongs to C1(=7). 7/19/2024 57 Dr. Shivashankar, ISE, GAT
  • 58.
    Cont… Problem 2: Dividethe following data points into two clusters using K-mean and identify (5,4) belongs to which cluster. Solution: Step 1: Choosing randomly 2 clusters centers C1=(2,1) and C2=(2,3) Step 2: Finding distance between two clusters centers and each data point (Apply Euclidean distance) For data points, (1,1) and C1(2,1): d= 1 − 2 2 + 1 − 1 2 = 1 (2,1) and (2,1): d= 2 − 2 2 + 1 − 1 2 = 0 (2,3) and (2,1): d= 2 − 2 2 + 3 − 1 2 = 2 and so on 7/19/2024 58 Dr. Shivashankar, ISE, GAT X 1 2 2 3 4 5 Y 1 1 3 2 3 5 Data points Distance from C1 (2,1) Distance from C2(2,3) New clusters (1,1) 1 2.24 C1 (2,1) 0 2 C1 (2,3) 2 0 C2 (3,2) 1.41 1.41 C1 (4,3) 2.83 2 C2 (5,5) 5 3.61 C2
  • 59.
    Cont… Step 3: cluster1 of C1={ (1,1), (2,1), (3,2)} cluster 2 of C2={ (2,3), (4,3), (5,5)} Step 4: Recalculate cluster center C1= 1 3 [(1,1)+(2,1)+(3,2)]= 1 3 [6,4]= (2,1.33) C2= 1 3 [(2,3)+(4,3)+(5,5)]= 1 3 [11,11]= (3.67,3.67) Step 5: Repeat the step 2 until we get same cluster center or same cluster elements 7/19/2024 59 Dr. Shivashankar, ISE, GAT Data points Distance from C1(2,1.33) Distance from C2(3.67,3.67) New clusters (1,1) 1.05 3.78 C1 (2,1) 0.33 3.15 C1 (2,3) 1.67 1.8 C1 (3,2) 1.204 1.8 C1 (4,3) 2.605 0.75 C2 (5,5) 4.74 1.88 C2
  • 60.
    Cont… cluster 1 ofC1={ (1,1), (2,1),(2,3), (3,2)} cluster 2 of C2={ (4,3), (5,5)} Step 6: Recalculate cluster center C1= 1 4 [(1,1)+(2,1)+(2,3)+(3,2)]= 1 4 [8,7]= (2,1.75) C2= 1 2 [(4,3)+(5,5)]= 1 2 [9,8]= (4.5,4) Step 7: Repeat the step 2 until we get same cluster center or same cluster elements Step 8: cluster 1 of C1={ (1,1), (2,1),(2,3), (3,2)} cluster 2 of C2={ (4,3), (5,5)} Since cluster elements are same as compared to previous iteration, stop. 7/19/2024 60 Dr. Shivashankar, ISE, GAT Data points Distance from C1(2,1.75) Distance from C2(4.5,4) New clusters (1,1) 1.25 4.61 C1 (2,1) 0.75 3.9 C1 (2,3) 1.25 2.69 C1 (3,2) 1.03 2.5 C1 (4,3) 2.36 1.12 C2 (5,5) 4.42 1.12 C2
  • 61.
    Cont… Problem 3 UseK-means clustering to cluster the following data into two groups. Data points {2,4,10,12,3,20,30,11,25}, initial cluster centroids are M1=4 and M2=11. Solution: Initial centroids: M1=4, M2=11. Distance to is calculated by d(𝑥2, 𝑥1) = 𝑥2 − 𝑥1 2 Therefore, C1={2,4,3} M1=(2+4+3)/3=3 C2={10,12,20,30,11,25} M2=(10+12+20+30+11+25)/6=18 so new centroids: M1=3, M2=18 7/19/2024 61 Dr. Shivashankar, ISE, GAT Data points Distance to Cluster New cluster M1(4) M2(11) 2 2 9 C1 4 0 7 C1 10 6 1 C2 12 8 1 C2 3 1 8 C1 20 16 9 C2 30 26 19 C2 11 7 0 C2 25 21 14 C2
  • 62.
    Cont… Current centroids: M1=3,M2=18 Therefore, C1={2,4,20,3} C2={12,20,30,11,25} So, New centroids: M1=4.75 M2=19.6 7/19/2024 62 Dr. Shivashankar, ISE, GAT Data points Distance to Cluster New cluster M1 M2 2 1 16 C1 C1 4 1 14 C1 C1 10 7 8 C2 C1 12 9 6 C2 C2 3 0 15 C1 C1 20 17 2 C2 C2 30 27 12 C2 C2 11 8 7 C2 C2 25 22 7 C2 C2
  • 63.
    Cont… Current centroids: M1=4.75,M2=19.6 Therefore, C1={2,4,10,11,12,3} C2={20,30,25} So, New centroids: M1=7 M2=25 7/19/2024 63 Dr. Shivashankar, ISE, GAT Data points Distance to Cluster New cluster M1 M2 2 2.75 17.6 C1 C1 4 0.75 15.6 C1 C1 10 5.25 9.6 C1 C1 12 7.25 7.6 C2 C1 3 1.75 16.6 C1 C1 20 15.25 0.4 C2 C2 30 25.25 10.4 C2 C2 11 6.25 8.6 C2 C1 25 20.25 5.4 C2 C2
  • 64.
    Cont… Current centroids: M1=7,M2=25 Therefore, final cluster are • C1=(2,4,10,11,12,13} • C2={20,30,5} 7/19/2024 64 Dr. Shivashankar, ISE, GAT Data points Distance to Cluster New cluster M1 M2 2 5 23 C1 C1 4 3 21 C1 C1 10 3 15 C1 C1 12 5 13 C1 C1 3 4 22 C1 C1 20 13 5 C2 C2 30 23 5 C2 C2 11 4 14 C1 C1 25 18 0 C2 C2
  • 65.
    Cont… Problem 4: UseK-means clustering to cluster and suppose that the data mining task is to cluster points into 3 cluster. Where the data points are A1(2,10), A2(2,5), A3(8,4), B1(5,8), B2(7,5), B3(6,4) and C1(1,2), c2(4,9). Suppose initially we assign A1, B1 and C1 as the center of each cluster respectively. Solution: Initial centroids: A1=(2,10), B1=(5,8), C1=(1,2) Distance to is calculated by d(𝑃1, 𝑃2) = 𝑥2 − 𝑥1 2 + 𝑦2 − 𝑦1 2 Therefore, C1={2,10} C2={(8,5,7,6,4) (4,8,5,4,9)} C3={(2,1)(5,2)} So new centroids: A1=(2,10), B1=(6,6) and C1=(1.5,3.5) 7/19/2024 65 Dr. Shivashankar, ISE, GAT Data points Distance to Cluster New cluster 2 10 5 8 1 2 A1 2 10 0 3.61 8.06 1 A2 2 5 5 4.24 3.16 3 A3 8 4 8.49 5 7.28 2 B1 5 8 3.61 0 7.21 2 B2 7 5 7.07 3.61 6.71 2 B3 6 4 7.21 4.12 5.39 2 C1 1 2 8 7.21 0 3 C2 4 9 2.24 1.41 7.62 2
  • 66.
    Cont… Current centroids: A1=(2,10),B1=(6,6), C1=(1.5,3.5) Therefore, C1={2,4) (10,9)} C2={(8,5,7,6) (4,8,5,4)} C3={(2,1)(5,2)} So new centroids: A1=(3,9.5), B1=(6.5,5.25) and C1=(1.5,3.5) 7/19/2024 66 Dr. Shivashankar, ISE, GAT Data points Distance to Cluster New cluster 2 10 6 6 1.5 3.5 A1 2 10 0 5.66 6.52 1 1 A2 2 5 5 4.12 1.58 3 3 A3 8 4 8.49 2.83 6.52 2 2 B1 5 8 3.61 2.24 5.7 2 2 B2 7 5 7.07 1.41 5.7 2 2 B3 6 4 7.21 2.00 4.53 2 2 C1 1 2 8.06 6.46 1.58 3 3 C2 4 9 2.24 3.61 6.04 2 1
  • 67.
    Cont… Current centroids: A1=(3,9.5),B1=(6.5,5.25), C1=(1.5,3.5) Therefore, the new centroids: A1=(3.6, 7.9), B1=(7,4.33) and C1=(1.5,3.5) 7/19/2024 67 Dr. Shivashankar, ISE, GAT Data points Distance to Cluster New cluster 3 9.5 6.5 65. 25 1.5 3.5 A1 2 10 1.12 6.54 6.52 1 1 A2 2 5 4.61 4.51 1.58 3 3 A3 8 4 7.43 1.95 6.52 2 2 B1 5 8 2.5 3.13 5.7 2 1 B2 7 5 6.02 0.56 5.7 2 2 B3 6 4 6.26 1.35 4.53 2 2 C1 1 2 7.76 6.39 1.58 3 3 C2 4 9 1.12 4.51 6.04 1 1
  • 68.
    Cont… Current centroids: A1=(3.6,7.9), B1=(7,4.33), C1=(1.5,3.5) Therefore, the final clusters: C1={(2,5,4)(10,8,9)}, C2={(8,7,6)(4,5,4)} C3={(2,1)(5,2)} 7/19/2024 68 Dr. Shivashankar, ISE, GAT Data points Distance to Cluster New cluster 3. 6 7.9 7 4.3 3 1.5 3.5 A1 2 10 1.94 7.56 6.52 1 1 A2 2 5 4.33 5.04 1.58 3 3 A3 8 4 6.62 1.05 6.52 2 2 B1 5 8 1.67 4.18 5.70 1 1 B2 7 5 5.21 0.67 5.70 2 2 B3 6 4 5.52 1.05 4.53 2 2 C1 1 2 7.49 6.44 1.58 3 3 C2 4 9 0.33 5.55 6.04 1 1
  • 69.
    Hierarchical Clustering • Hierarchicalclustering is another unsupervised machine learning algorithm, which is used to group the unlabeled datasets into a cluster. • It is a connectivity-based clustering model that groups the data points together that are close to each other based on the measure of similarity or distance. • The assumption is that data points that are close to each other are more similar or related than data points that are farther apart. • It is based on the idea of creating a hierarchy of clusters, where each cluster is made up of smaller clusters that can be further divided into even smaller clusters. • This hierarchical structure makes it easy to visualize the data and identify patterns within the data. Hierarchical clustering is of two types. Agglomerative clustering Divisive clustering 7/19/2024 69 Dr. Shivashankar, ISE, GAT
  • 70.
    Agglomerative Clustering • Agglomerativeclustering is a type of data clustering method used in unsupervised learning. • It begins with N groups, each containing initially one entity, and then the two most similar groups merge at each stage until there is a single group containing all the data. • It is an iterative process that groups similar objects into clusters based on some measure of similarity. • It uses a bottom-up approach for dividing data points into clusters. • The algorithm begins by assigning each object to its own cluster. • It then uses a distance metric to determine the similarity between objects and clusters. • If two clusters have similar elements, they are merged together into a larger cluster. • This continues until all objects are grouped into one final cluster. 7/19/2024 70 Dr. Shivashankar, ISE, GAT
  • 71.
    Agglomerative Hierarchical ClusteringAlgorithm • Step 1: Consider each dataset as a single cluster and calculate the distance of one cluster from all the other clusters. • Step 2: In the second step, comparable clusters are merged together to form a single cluster. Let’s say cluster (B) and cluster (C) are very similar to each other, therefore we merge them in the second step similarly to cluster (D) and (E) and at last, we get the clusters [(A), (BC), (DE), (F)] • Step 3: We recalculate the proximity according to the algorithm and merge the two nearest clusters([(DE), (F)]) together to form new clusters as [(A), (BC), (DEF)] • Step 4: Repeating the same process; The clusters DEF and BC are comparable and merged together to form a new cluster. We’re now left with clusters [(A), (BCDEF)]. • Step 4: At last, the two remaining clusters are merged together to form a single cluster [(ABCDEF)]. 7/19/2024 71 Dr. Shivashankar, ISE, GAT
  • 72.
    Cont… The average linkageclustering uses the average formula, i.e. distance between two clustering A & B d(A,B)=avg{d(a,y): x𝜖𝐴, 𝑦𝜖𝐵} d(A,B)= ∈𝑑 𝑥,𝑦 :x𝜖𝐴,𝑦𝜖𝐵 𝐴 𝐵 7/19/2024 72 Dr. Shivashankar, ISE, GAT Fig.9. Concept of Agglomerative Clustering
  • 73.
    Key Issues inHierarchical Clustering Lack of a Global Objective Function: • Agglomerative hierarchical clustering techniques use various criteria to decide locally, at each step, which clusters should be merged (or split for divisive approaches). This approach yields clustering algorithms that avoid the difficulty of attempting to solve a hard combinatorial optimization problem. • Do not have problems with local minima or difficulties in choosing initial points. Ability to Handle Different Cluster Sizes: • There are two approaches: weighted, which treats all clusters equally, and unweighted, which takes the number of points in each cluster into account. • Treating clusters of unequal size equally gives different weights to the points in different clusters, while taking the cluster size into account gives points in different clusters the same weight. Merging Decisions are Final: • Agglomerative hierarchical clustering algorithms tend to make good local decisions about combining two clusters since they can use information about the pairwise similarity of all points. • This approach prevents a local optimization criterion from becoming a global optimization criterion. 7/19/2024 73 Dr. Shivashankar, ISE, GAT
  • 74.
    Advantage and disadvantagesof Agglomerative Hierarchical Clustering Algorithm Advantages 1. Performance: It is effective in data observation from the data shape and returns accurate results 2. Easy: It is easy to use and provides better user guidance with good community support. So much content and good documentation are available for a better user experience. 3. More Approaches: Two approaches are there using which datasets can be trained and tested, agglomerative and divisive. 4. Performance on Small Datasets: The hierarchical clustering algorithms are effective on small datasets and return accurate and reliable results with lower training and testing time. Disadvantages 1. Time Complexity: As many iterations and calculations are associated, the time complexity of hierarchical clustering is high. In some cases, it is one of the main reasons for preferring K-Means clustering. 2. Space Complexity: As many calculations of errors with losses are associated with every epoch, the space complexity of the algorithm is very high. Due to this, while implementing the hierarchical clustering, the space of the model is considered. In such cases, we prefer K-Means clustering. 3. Poor performance on Large Datasets: When training a hierarchical clustering algorithm for large datasets, the training process takes so much time with space which results in poor performance of the algorithms. 7/19/2024 74 Dr. Shivashankar, ISE, GAT
  • 75.
    Exercise problems Problem 1:Consider the following set of 6 one dimensional data points : 18,22,25, 42,27,43. merge the clusters using minimum distance and update proximity matrix accordingly. Show proximity matrix to each iteration. Solution: Since minimum distance is 1—(42,43) or (43,42), so ,merge 42 and 43 From matrix 2, since 2 is minimum distance, merge (25,27) 7/19/2024 75 Dr. Shivashankar, ISE, GAT 18 22 25 27 42 43 18 0 4 7 9 24 25 22 4 0 3 5 20 21 25 7 3 0 2 17 18 27 9 5 2 0 15 16 42 24 20 17 15 0 1 43 25 21 18 16 1 0 18 22 25 27 42,43 18 0 4 7 9 24 22 4 0 3 5 20 25 7 3 0 2 17 27 9 5 2 0 15 42,43 24 20 17 15 0
  • 76.
    Exercise problems Since 3is minimum distance, merge 22,25.and 27---{22,(25,27)} Since 4 is minimum distance, merge 18,22,25,27---[18,{22,(25,27)}] Draw the dendrogram for the merged data points. 7/19/2024 76 Dr. Shivashankar, ISE, GAT 18 22 25,27 42,43 18 0 4 7 24 22 4 0 3 20 25,27 7 3 0 15 42,43 24 20 15 0 18 22,25,27 42,43 18 0 4 24 22,25,27 4 0 15 42,43 24 15 0
  • 77.
    Problems Problem 2: Forthe given dataset, find the clusters using a single link technique. Use Euclidean distance and draw the dendrogram. Solution: Step 1: Compute the distance matrix using Euclidean distance. Let A(𝑥1, 𝑦1) 𝑎𝑛𝑑 B(𝑥2, 𝑦2) Then Euclidean distance between two points d(A,B)= x2 − x1 2 + y2 − y1 2 7/19/2024 77 Dr. Shivashankar, ISE, GAT Sample No X Y P1 0.40 0.53 P2 0.22 0.38 P3 0.35 0.32 P4 0.26 0.19 P5 0.08 0.41 P6 0.45 0.30
  • 78.
    Conti.. d(P1,P2)= 𝟎. 𝟐𝟐− 𝟎. 𝟒𝟎 𝟐 + 𝟎. 𝟑𝟖 − 𝟎. 𝟓𝟑 𝟐 = 0.23 d(P1,P3)= 𝟎. 𝟑𝟓 − 𝟎. 𝟒𝟎 𝟐 + 𝟎. 𝟑𝟐 − 𝟎. 𝟓𝟑 𝟐 = 0.22 d(P2,P3)= 𝟎. 𝟑𝟓 − 𝟎. 𝟐𝟐 𝟐 + 𝟎. 𝟑𝟐 − 𝟎. 𝟑𝟖 𝟐 = 0.14 and so on Step 2: Merging the two closest members Here, the minimum values is 0.10 and hence we combine P3 and P6 (as 0.10 came in the P6 row and p3 column). Now, form the clusters of elements corresponding to the minimum value and update the distance matrix. 7/19/2024 78 Dr. Shivashankar, ISE, GAT P1 P2 P3 P4 P5 P6 P1 0 P2 0.23 0 P3 0.22 0.14 0 P4 0.37 0.19 0.13 0 P5 0.34 0.14 0.28 0.23 0 P6 0.24 0.24 0.10 0.22 0.39 0
  • 79.
    Conti.. (P3,P6) Merge two closestmembers of the two clusters. The minimum value is 0.13 and hence we combine P3, P6, P4 {(P3, P6), P4} 7/19/2024 79 Dr. Shivashankar, ISE, GAT P1 P2 P3 P4 P5 P6 P1 0 P2 0.23 0 P3 0.22 0.14 0 P4 0.37 0.19 0.13 0 P5 0.34 0.14 0.28 0.23 0 P6 0.24 0.24 0.10 0.22 0.39 0 P1 P2 P3,P6 P4 P5 P1 0 P2 0.23 0 P3,P6 0.22 0.14 0 P4 0.37 0.19 0.13 0 P5 0.34 0.14 0.28 0.23 0 P1 P2 P3,P6 P4 P5 P1 0 P2 0.23 0 P3,P6 0.22 0.14 0 P4 0.37 0.19 0.13 0 P5 0.34 0.14 0.28 0.23 0 P1 P2 P3,P6,P4 P5 P1 0 P2 0.23 0 P3,P6,P4 0.22 0.14 0 P5 0.34 0.14 0.28 0
  • 80.
    Conti.. Now combined P2and P5 [{(P3, P6), P4},(P2,P5)] Now update the matrix and merge P2,P5,P3,P6 and P4 ([{(P3, P6), P4},(P2,P5)], P1) Now we have reached to the solution. 7/19/2024 80 Dr. Shivashankar, ISE, GAT P1 P2 P3,P6,P4 P5 P1 0 P2 0.23 0 P3,P6,P4 0.22 0.14 0 P5 0.34 0.14 0.28 0 P1 P2,P5 P3,P6,P4 P1 0 P2,P5 0.23 0 P3,P6,P4 0.22 0.14 0 P1 P2,P5 P3,P6,P4 P1 0 P2,P5 0.23 0 P3,P6,P4 0.22 0.14 0 P1 P2,P5,P3,P6,P 4 P1 0 P2,P5,P3,P6,P4 0.22 0
  • 81.
    Conti The dendrogram asper the solution is as follow P3 P6 P4 P2 P5 P1 Dendrogram of the cluster formed for the group P1,P2,P3,P4,P5 and P6. 7/19/2024 81 Dr. Shivashankar, ISE, GAT
  • 82.
    Conti.. Problem 3: Givena one dimensional dataset {1,5,8,10,2}, use the Agglomerative clustering algorithm with complete link with Euclidean distance to establish a hierarchical grouping relationship. By using the cutting threshold of 5, how many clusters are there? What is there membership in each group? Solution: Euclidean distance = 𝑥2 − 𝑥1 2 + 𝑦2 − 𝑦1 2 for 1 dimensional Euc-dist= 𝑥2 − 𝑥1 2 Apply 1D Euclidean distance to calculate the matrix 7/19/2024 82 Dr. Shivashankar, ISE, GAT 1 5 8 10 2 1 0 4 7 9 1 5 4 0 3 5 3 8 7 3 0 2 6 10 9 5 2 0 8 2 1 3 6 8 0 1 2 3 4 5 1 0 4 7 9 1 2 4 0 3 5 3 3 7 3 0 2 6 4 9 5 2 0 8 5 1 3 6 8 0
  • 83.
    Conti.. From the distancematrix, we can find distance between points 1 and 5 is smallest, i,e.2. Then merge {1,5}. Now recalculate the distance: d(2,{1,5}}=max{d(2,1), d(2,5)}=max(4,3)=4 d(3,{1,5}}=max{d(3,1), d(3,5)}=max(7,6)=7 d(4,{1,5}}=max{d(4,1), d(4,5)}=max(4,5)=9 From the matrix, the distance between points 3 and 4 is smallest , i.e.2 Hence they merge together as to form a cluster {3,4}. Using the complete link, we have the distance between different points/cluster as follows. d({1,5}, {3,4})=max{d({1,5},3), d ({1,5},4)}=max(7,9)=9 d(2, {3,4})=max{d(2,3), d (2,4)}=max(3,5)=5 Thus, we can update the distance matrix, where row 2 corresponds to point 2, row 1 and 3 corresponds to Cluster {1,5} and {3,4} as follows. 7/19/2024 83 Dr. Shivashankar, ISE, GAT 1,5 2 3 4 1,5 0 4 7 9 2 4 0 3 5 3 7 3 0 2 4 9 5 2 0 1,5 2 3,4 1,5 0 4 9 2 4 0 5 3,4 9 5 0
  • 84.
    Conti.. Following the sameprocedure, we merge pints 2 with the cluster {1,5} to form {1,2,5} and update the distance matrix as follows. After increase the distance threshold to 9, all clusters would merge. Fig 12: Dendogram for the given datasets 7/19/2024 84 Dr. Shivashankar, ISE, GAT [1,5],2 [3,4] [1,5],2 0 9 [3,4] 9 0
  • 85.
    Conti.. Problem 3: Giventhe data set {a,b,c,d,e} and following distance matrix. Construct a dendrogram by average linkage hierarchical clustering using the Agglomerative method. Solution: The average linkage clustering uses the average formula, i.e. distance between two clustering A & B d(A,B)=avg{d(a,y): x𝜖𝐴, 𝑦𝜖𝐵} d(A,B)= ∈𝑑 𝑥,𝑦 :x𝜖𝐴,𝑦𝜖𝐵 𝐴 𝐵 7/19/2024 85 Dr. Shivashankar, ISE, GAT a b c d e a 0 9 3 6 11 b 9 0 7 5 10 c 3 7 0 9 2 d 6 5 9 0 8 e 11 10 2 8 0
  • 86.
    Conti.. Dataset : {a,b,c,d,e} Initialclustering (Single to a sets) C1={a},{b},{c},{d},{e} From the table, the minimum distance is the distance between the clusters {c} and {e}. Also, d({c}:{e})=2 We merge {c} ad {e} to form the cluster {c,e} The new set of cluster C2 ={a},{b},{d},{c,e} 7/19/2024 86 Dr. Shivashankar, ISE, GAT a b c d e a 0 9 3 6 11 b 9 0 7 5 10 c 3 7 0 9 2 d 6 5 9 0 8 e 11 10 2 8 0 a b c,e d a 0 9 ? 6 b 9 0 ? 5 c,e ? ? 0 ? d 6 5 ? 0
  • 87.
    Conti.. Let us computethe distance of{c,e} from other clusters. d({c,e},{a})=avg{d(c,a),d(e,a)}= 3+11 2∗1 =7 d({c,e},{b})=avg{d(c,b),d(e,b)}= 7+10 2∗1 =8.5 d({c,e},{d})=avg{d(c,d),d(e,d)}= 9+8 2∗1 =7 Now update the table. From C2 table, the minimum distance is the distance between the cluster {d} and {b}. Also, d({b},{d})=5 We merge {b} and {d} to form the cluster {b,d} The new set of cluster, C3: {a},{c,e},{b,d} 7/19/2024 87 Dr. Shivashankar, ISE, GAT a b c,e d a 0 9 7 6 b 9 0 8.5 5 c,e 7 8.5 0 8.5 d 6 5 8.5 0
  • 88.
    Conti.. Let us computethe distance of {b,d} from other clusters. d({b,d},{a})=avg{d(b,a),d(d,a)} d({b,d},{a}) = 9+6 2∗1 = 7.5 d({b,d},{c,e}) =Avg{d(b,c): d(b,e),d(d,c),d(d,e)} d({b,d},{c,e})= 7+10+9+8 2∗2 = 8.5 7/19/2024 88 Dr. Shivashankar, ISE, GAT a b c,e d a 0 9 7 6 b 9 0 8.5 5 c,e 7 8.5 0 8.5 d 6 5 8.5 0 a b,d c,e A 0 ? 7 b,d ? 0 ? c,e 7 ? 0 a b,d c,e a 0 7.5 7 b,d 7.5 0 8.5 c,e 7 8.5 0
  • 89.
    Conti.. From the table,the minimum distance is the distance between the clusters {a} and {c,e} is 7. Also, d({a});{c,e})=7 We merge {a} and {b,d} to form the cluster {a,b,d} The new set of clusters C4: {a,c,e},{b,d} Let us compute the distance of {a,c,e}from other cluster. D({a,c,e}, {b,d})=Avg{d(a,b),d(a,d),d(c,b),d(c,d),d(e,b),d(e,d) D({a,c,e};{bd})= 9+6+7+9+10+8 3∗2 = 8.16 Fig 11: Dendogram for the dataset {a,b,c,d,e}. 7/19/2024 89 Dr. Shivashankar, ISE, GAT a,c,e b,d a,c,e 0 ? b,d ? 0 a,c,e b,d a,c,e 0 8.16 b,d 8.16 0
  • 90.
    Divisive Clustering • Divisiveclustering is also a type of hierarchical clustering that is used to create clusters of data points. • It is an unsupervised learning algorithm that begins by placing all the data points in a single cluster and then progressively splits the clusters until each data point is in its own cluster. • It is useful for analyzing datasets that may have complex structures or patterns, as it can help identify clusters that may not be obvious at first glance. • Divisive clustering works by first assigning all the data points to one cluster. • Then, it looks for ways to split this cluster into two or more smaller clusters. • This process continues until each data point is in its own cluster. 7/19/2024 90 Dr. Shivashankar, ISE, GAT
  • 91.
    Cont… Steps to DivisiveHierarchical Clustering The algorithm for divisive hierarchical clustering involves several steps. Step 1: Consider all objects a part of one big cluster. Step 2: Spilt the big cluster into small clusters using any flat-clustering method- ex. k- means. Step 3: Selects an object or subgroup to split into two smaller sub-clusters based on some distance metric such as Euclidean distance or correlation coefficients. Step 4: The process continues recursively until each object forms its own cluster. 7/19/2024 91 Dr. Shivashankar, ISE, GAT Fig. 12: Concept of Divisive Hierarchical Clustering
  • 92.
    Cont… 7/19/2024 92 Dr. Shivashankar,ISE, GAT Fig.13. Presents the differences between Agglomerative and Divisive algorithms.
  • 93.
    Conti.. 1. k-NN algorithmdoes more computation on test time rather than train time. A)TRUE B) FALSE 2. Which of the following distance metric can not be used in k-NN? A) Manhattan B) Minkowski C) Tanimoto D) Jaccard E) Mahalanobis F) All can be used 3) Which of the following option is true about k-NN algorithm? A) It can be used for classification B) It can be used for regression C) It can be used in both classification and regression 4) Which of the following machine learning algorithm can be used for imputing missing values of both categorical and continuous variables? A) K-NN B) Linear Regression C) Logistic Regression 5) Which of the following will be Euclidean Distance between the two data point A(1,3) and B(2,3)? A) 1 B) 2 C) 4 D) 8 7/19/2024 93 Dr. Shivashankar, ISE, GAT
  • 94.
    A. K-Means Clusteringcomes under 1.Supervised learning Algorithm 2. Unsupervised Learning Algorithm 3. Reinforcement Learning 4. None of the above B. Which of the following is true for clustering 1. Clustering is a technique used to group similar objects into clusters. 2. partition data into groups 3. dividing entire data, based on patterns in data 4. All of the above C. Which of the following is true for K-Means Clustering 1. All data points in a cluster should be similar to each 2. other. 3. The data points from different clusters should be as different as possible. 4. Both 1 and 2 5. Only 1 6. Only 2 D. Which of the following applications comes under clustering 1. Customer Segmentation 2. Targeted Marketing 3. Recommendation Engines 4. Predicting the temperature 5. Only 1,2,3,4 6. All the above E. What is intra cluster distance 1. distance between points in the cluster to its centroid 2. distance between each point in the cluster 3. sum of squares of distances between points 4. None of the above 7/19/2024 94 Dr. Shivashankar, ISE, GAT
  • 95.
    Conti.. Q1. Movie recommendationsystems are an example of: 1. Classification 2. Clustering 3. Reinforcement Learning 4. Regression Options: A. 2 Only B. 1 and 2 C. 1 and 3 D. 2 and 3 E. 1, 2, and 3 F. 1, 2, 3, and 4 Q2. Sentiment Analysis is an example of: Regression Classification Clustering Reinforcement Learning Options: A. 1 Only B. 1 and 2 C. 1 and 3 D. 1, 2 and 3 E. 1, 2 and 4 F. 1, 2, 3 and 4 7/19/2024 95 Dr. Shivashankar, ISE, GAT
  • 96.
    Conti.. Q3. Can decisiontrees be used for performing clustering? A. True B. False Q4. What is the minimum no. of variables/ features required to perform clustering? Options: A. 0 B. 1 C. 2 D. 3 Q5. For two runs of K-Mean clustering, is it expected to get the same clustering results? A. Yes B. No Q6. Which of the following clustering algorithms suffers from the problem of convergence at local optima? A. K- Means clustering algorithm B. Agglomerative clustering algorithm C. Expectation-Maximization clustering algorithm D. Diverse clustering algorithm Options: A. 1 only B. 2 and 3 C. 2 and 4 D. 1 and 3 E. 1,2 and 4 F. All of the above 7/19/2024 96 Dr. Shivashankar, ISE, GAT