Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners Part - 2 | Simplilearn

Clustering
What is Clustering?
K-Means Clustering
Flowchart to understand K-means Clustering
Clustering of cars based on brands
Logistic Regression
What’s in it for you?
What is Logistic Regression?
Logistic Regression Curve & Sigmoid function
Classify whether a tumor is malignant or benign
based on features
Cover/transition slides
will be changed

Clustering
Suppose, we
have a pile of
books of
different genres!

Clustering
Now, we divide them into different groups like
Fiction
Horror
Educational

Well, organizing objects
into groups based on
their similarity is
Clustering!

their similarity is
Clustering!
K-means Clustering

K-Means Clustering is an
example of Unsupervised
learning

learning
It is used when you have
unlabeled data!

learning
It is used when you have
unlabeled data!
To find clusters in the data
based on feature similarity!

Steps for K-Means
Suppose we have these data
points and we want to assign
them into clusters

STEP 1: Initialize Cluster Centroids
We pick ‘K’ clusters & assign random centroids to clusters

STEP 1: Initialize Cluster Centroids
We pick ‘K’ clusters & assign random centroids to clusters
Then, we compute distance from objects to centroids

STEP 2: Compute Minimum Distance
Now, we form new clusters based on minimum distance and calculate
their centroids

STEP 3: Assign Points to New Clusters
Repeat previous two steps iteratively till the cluster centroids stop
changing their positions and become static

Shall we see a flowchart to
understand?

Flowchart to understand K-Means
Choose K (Elbow Method)
START
Assign random centroids to clusters
Compute distance from objects to centroids
Yes
Form new clusters based on minimum distance and calculate their centroids
Compute distance from objects to new centroids
Repeat until
no
observations
change
groups

K-Means Algorithm
Subject A B
1 1 1
2 1.5 2
3 3 4
4 5 7
5 3.5 5
6 4.5 5
7 3.5 4.5
Suppose, we have this dataset of 7 individuals and their
score on two topics (A and B)

K-Means Algorithm
Now, lets take two farthest-apart points as initial cluster
centroids
Subject A B
1 1 1
2 1.5 2
3 3 4
4 5 7
5 3.5 5
6 4.5 5
7 3.5 4.5

K-Means Algorithm
Now, lets take two farthest-apart points as initial cluster
centroids

K-Means Algorithm
Each point is then assigned to the closest cluster with
respect to their distance from the centroids Cluster 1
Cluster 2

K-Means Algorithm
Now, we again calculate the centroids of each cluster:
Individual
Mean Vector
(centroid)
Cluster 1 1, 2, 3 (1.8, 2.3)
Cluster 2 4, 5, 6, 7 (4.1, 5.4)
Cluster 1
Cluster 2

K-Means Algorithm
We compare each individual’s distance to its own cluster mean and to
that of the opposite cluster. And we find:
Individual
Distance to mean
(centroid) of Cluster
1
Distance to mean
(centroid) of
Cluster 2
1 1.5 5.4
2 0.4 4.3
3 2.1 1.8
4 5.7 1.8
5 3.2 0.7
6 3.8 0.6
7 2.8 1.1
Using Eucledian Distance
between the points and the
mean
Cluster 1
Cluster 2

K-Means Algorithm
Individual
Distance to mean
1
Distance to mean
(centroid) of
Cluster 2
1 1.5 5.4
2 0.4 4.3
3 2.1 1.8
4 5.7 1.8
5 3.2 0.7
6 3.8 0.6
7 2.8 1.1
Only individual 3 is nearer to the mean of the opposite cluster (Cluster 2)
than its own (Cluster 1)
Cluster 1
Cluster 2
Moving point 3 to new
cluster

K-Means Algorithm
Thus, individual 3 is relocated to Cluster 2 resulting in the new partition:
Individual
Distance to mean
1
Distance to mean
(centroid) of
Cluster 2
1 1.5 5.4
2 0.4 4.3
3 2.1 1.8
4 5.7 1.8
5 3.2 0.7
6 3.8 0.6
7 2.8 1.1
Cluster 1
Cluster 2

K-Means Algorithm
For the new clusters, we will find the actual cluster
centroids:
Individual
Mean Vector
(centroid)
Cluster 1 1, 2, 3 (1.25, 1.5)
Cluster 2 4, 5, 6, 7 (3.9, 5.1)
Cluster 1
Cluster 2

K-Means Algorithm
On comparing the distance of each individual’s distance
to it’s own cluster mean and to that of the opposite cluster,
we find that the data points are stable, hence we have our
final clusters!
Cluster 1
Cluster 2

K-Means Algorithm
To find appropriate number of clusters in a dataset, we use elbow method:
WSS
No . of. clusters
Elbow point
Within sum of squares (WSS) is defined
as the sum of the squared distance
between each member of the cluster and
its centroid
Finding the optimal number of clusters using
the elbow of the graph is called as the Elbow
method

Use Case
Using K-means clustering to cluster cars into brands using the
parameters such as horsepower, cubic inches, make year, etc.
Dataset: Cars data having information about 3 brands of cars namely
Toyota, Honda, Nissan

Clustering
Today, we’ll dive into K-
means Clustering!
their similarity is
Clustering!
Logistic Regression

Logistic Regression
Now, let’s look into
Logistic Regression

Logistic Regression
The Logistic Regression algorithm is the
simplest classification algorithm used for
binary or multi-classification problems

Logistic Regression
To brush up,
y = mx+c
The dependent variable is the
target class variable we are
going to predict
In the previous tutorial, we learnt about Linear Regression, dependent and independent variables

Logistic Regression
In the previous tutorial, we learnt about Linear Regression, dependent and independent variables
The independent variables
(x1…xn) are the features or
attributes we are going to use to
predict the target class
To brush up,
y = mx+c
The dependent variable is the
target class variable we are
going to predict

Logistic Regression
1
0
Marks
No. of hours studied
We know what a
linear regression
looks like, but using
this graph we
cannot divide the
outcome into
categories
100

Logistic Regression
100
0
We know what a
linear regression
looks like, but using
this graph we
cannot divide the
outcome into
categories
For example, a linear regression graph can
tell us that with increase in number of hours
studied, the marks of a student will
increase
But, it will not tell us whether the student
will pass or not!
Marks

Logistic Regression
In such cases, where we need the output
as categorical value, we will use logistic
regression! 100
0
Marks

Logistic Regression
0
100 1
0
Sigmoid
Curve
Sigmoid Function
y = m*x + c
p =
1
1 + ⅇ
− y
p
ln (
1-p
) = m*x + c
No. of hours studied No. of hours studied
Marks
Marks

Logistic Regression
0
0.2
0.4
0.6
0.8
1
1.2
0 1 2 3 4 5 6 7 8 9
Logistic Regression
Threshold value
Probability > 0.50
Value is rounded off to 1 indicating that the
student will pass
Probability < 0.50 , the value is
rounded off to 0 indicating that the
student will fail
0.30
0.82

Problem statement: To classify whether a
tumor is ‘malignant’ or ‘benign’

Use Case
So, this model is
able to predict the
type of tumor with
91% accuracy!

Finally, let’s discuss the answers to the quiz asked in
Machine Learning Tutorial Part-1
for the instructor

What do you understand from Measures and Dimensions?
Each field from the data source is automatically assigned a
datatype (such as string, integer) and a role (dimension or
measure)
Aggregation applied on measures is ‘Sum’ by default but you
can always change the default aggregation in the settings
Can you tell what’s happening in the
following cases?
A. Grouping documents into different categories based on the
topic and content of each document
“This is an example of Clustering where K-means
clustering can be used to group the documents by
topics using bag-of-words approach”

measure)
following cases?
B. Identifying hand-written digits in images correctly
“This is an example of Classification. The traditional
approach to solving this would be to extract digit
dependent features like curvature of different digits,
etc. and then use a classifier like SVM to distinguish
between images”

measure)
following cases?
C. Behavior of a website indicating that the site is not working
as designed
“This is an example of Anomaly Detection. In this case,
the algorithm learns what is "normal" and what is "not
normal", usually by observing the logs of the website”

measure)
following cases?
D. Predicting salary of an individual based his/her years
of experience
“This is an example of Regression. This problem can
be mathematically defined as a function between
independent (years of experience) and dependent
variable (salary of an individual)”

Summary
What is K-Means Elbow Method to choose K Clustering cars with K-means
Classifying tumor with logisticWhat is logistic regression

Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners Part - 2 | Simplilearn

Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners Part - 2 | Simplilearn

In this document