Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners Part - 2 | Simplilearn
The document discusses clustering, specifically k-means clustering, explaining its purpose as an unsupervised learning method for grouping unlabeled data based on feature similarity. It also covers logistic regression, a classification algorithm for binary and multi-classification problems, highlighting its use in predicting outcomes like the malignancy of tumors. Examples and methods such as the elbow method for determining the number of clusters in k-means are also provided.
Introduces clustering, defines it as organizing objects into groups based on similarity, and offers examples like K-Means clustering and organizing books.
Explains K-Means clustering as an unsupervised learning method used for unlabeled data to form clusters based on feature similarity.
Details the steps for K-Means clustering including initialization of centroids, distance computation, and cluster assignments until stabilization.
Introduces a flowchart to visualize the K-Means clustering process from choosing K to determining stable clusters.
Presents a dataset for K-Means clustering and the method for selecting initial centroids for the clustering process.
Describes how data points are assigned to clusters, centroid recalculations, and stabilizing clusters with final outputs.
Discusses using the elbow method to select optimal clusters and provides a use case of K-Means clustering for car brands.
Begins the discussion on logistic regression, clarifying its purpose for classification problems and contrasting it with linear regression.
Explains why logistic regression is crucial for categorical outcomes and introduces the sigmoid function in the context of binary outcomes.
Illustrates a practical logistic regression use case for classifying tumors, achieving a prediction accuracy of 91%.
Explores the concept of data measures and dimensions, providing examples of clustering, classification, and regression use cases.
Summarizes the K-Means elbow method for clustering cars and logistic regression for tumor classification.
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners Part - 2 | Simplilearn
1.
Clustering
What is Clustering?
K-MeansClustering
Flowchart to understand K-means Clustering
Clustering of cars based on brands
Logistic Regression
What’s in it for you?
What is Logistic Regression?
Logistic Regression Curve & Sigmoid function
Classify whether a tumor is malignant or benign
based on features
Cover/transition slides
will be changed
K-Means Clustering isan
example of Unsupervised
learning
It is used when you have
unlabeled data!
8.
K-Means Clustering isan
example of Unsupervised
learning
It is used when you have
unlabeled data!
To find clusters in the data
based on feature similarity!
Flowchart to understandK-Means
Choose K (Elbow Method)
START
Assign random centroids to clusters
Compute distance from objects to centroids
Yes
Form new clusters based on minimum distance and calculate their centroids
Compute distance from objects to new centroids
Repeat until
no
observations
change
groups
K-Means Algorithm
Subject AB
1 1 1
2 1.5 2
3 3 4
4 5 7
5 3.5 5
6 4.5 5
7 3.5 4.5
Suppose, we have this dataset of 7 individuals and their
score on two topics (A and B)
19.
K-Means Algorithm
Now, letstake two farthest-apart points as initial cluster
centroids
Subject A B
1 1 1
2 1.5 2
3 3 4
4 5 7
5 3.5 5
6 4.5 5
7 3.5 4.5
K-Means Algorithm
Each pointis then assigned to the closest cluster with
respect to their distance from the centroids Cluster 1
Cluster 2
22.
K-Means Algorithm
Now, weagain calculate the centroids of each cluster:
Individual
Mean Vector
(centroid)
Cluster 1 1, 2, 3 (1.8, 2.3)
Cluster 2 4, 5, 6, 7 (4.1, 5.4)
Cluster 1
Cluster 2
23.
K-Means Algorithm
We compareeach individual’s distance to its own cluster mean and to
that of the opposite cluster. And we find:
Individual
Distance to mean
(centroid) of Cluster
1
Distance to mean
(centroid) of
Cluster 2
1 1.5 5.4
2 0.4 4.3
3 2.1 1.8
4 5.7 1.8
5 3.2 0.7
6 3.8 0.6
7 2.8 1.1
Using Eucledian Distance
between the points and the
mean
Cluster 1
Cluster 2
24.
K-Means Algorithm
Individual
Distance tomean
(centroid) of Cluster
1
Distance to mean
(centroid) of
Cluster 2
1 1.5 5.4
2 0.4 4.3
3 2.1 1.8
4 5.7 1.8
5 3.2 0.7
6 3.8 0.6
7 2.8 1.1
Only individual 3 is nearer to the mean of the opposite cluster (Cluster 2)
than its own (Cluster 1)
Cluster 1
Cluster 2
Moving point 3 to new
cluster
25.
K-Means Algorithm
Thus, individual3 is relocated to Cluster 2 resulting in the new partition:
Individual
Distance to mean
(centroid) of Cluster
1
Distance to mean
(centroid) of
Cluster 2
1 1.5 5.4
2 0.4 4.3
3 2.1 1.8
4 5.7 1.8
5 3.2 0.7
6 3.8 0.6
7 2.8 1.1
Cluster 1
Cluster 2
26.
K-Means Algorithm
For thenew clusters, we will find the actual cluster
centroids:
Individual
Mean Vector
(centroid)
Cluster 1 1, 2, 3 (1.25, 1.5)
Cluster 2 4, 5, 6, 7 (3.9, 5.1)
Cluster 1
Cluster 2
27.
K-Means Algorithm
On comparingthe distance of each individual’s distance
to it’s own cluster mean and to that of the opposite cluster,
we find that the data points are stable, hence we have our
final clusters!
Cluster 1
Cluster 2
28.
K-Means Algorithm
To findappropriate number of clusters in a dataset, we use elbow method:
WSS
No . of. clusters
Elbow point
Within sum of squares (WSS) is defined
as the sum of the squared distance
between each member of the cluster and
its centroid
Finding the optimal number of clusters using
the elbow of the graph is called as the Elbow
method
29.
Use Case
Using K-meansclustering to cluster cars into brands using the
parameters such as horsepower, cubic inches, make year, etc.
Dataset: Cars data having information about 3 brands of cars namely
Toyota, Honda, Nissan
Clustering
Today, we’ll diveinto K-
means Clustering!
Well, organizing objects
into groups based on
their similarity is
Clustering!
Logistic Regression
Logistic Regression
The LogisticRegression algorithm is the
simplest classification algorithm used for
binary or multi-classification problems
42.
Logistic Regression
To brushup,
y = mx+c
The dependent variable is the
target class variable we are
going to predict
In the previous tutorial, we learnt about Linear Regression, dependent and independent variables
43.
Logistic Regression
In theprevious tutorial, we learnt about Linear Regression, dependent and independent variables
The independent variables
(x1…xn) are the features or
attributes we are going to use to
predict the target class
To brush up,
y = mx+c
The dependent variable is the
target class variable we are
going to predict
44.
Logistic Regression
1
0
Marks
No. ofhours studied
We know what a
linear regression
looks like, but using
this graph we
cannot divide the
outcome into
categories
100
45.
Logistic Regression
100
0
We knowwhat a
linear regression
looks like, but using
this graph we
cannot divide the
outcome into
categories
For example, a linear regression graph can
tell us that with increase in number of hours
studied, the marks of a student will
increase
But, it will not tell us whether the student
will pass or not!
Marks
No. of hours studied
46.
Logistic Regression
In suchcases, where we need the output
as categorical value, we will use logistic
regression! 100
0
No. of hours studied
Marks
Logistic Regression
0
0.2
0.4
0.6
0.8
1
1.2
0 12 3 4 5 6 7 8 9
Logistic Regression
Threshold value
Probability > 0.50
Value is rounded off to 1 indicating that the
student will pass
Probability < 0.50 , the value is
rounded off to 0 indicating that the
student will fail
0.30
0.82
Use Case
So, thismodel is
able to predict the
type of tumor with
91% accuracy!
60.
Finally, let’s discussthe answers to the quiz asked in
Machine Learning Tutorial Part-1
for the instructor
61.
What do youunderstand from Measures and Dimensions?
Each field from the data source is automatically assigned a
datatype (such as string, integer) and a role (dimension or
measure)
Aggregation applied on measures is ‘Sum’ by default but you
can always change the default aggregation in the settings
Can you tell what’s happening in the
following cases?
A. Grouping documents into different categories based on the
topic and content of each document
“This is an example of Clustering where K-means
clustering can be used to group the documents by
topics using bag-of-words approach”
62.
What do youunderstand from Measures and Dimensions?
Each field from the data source is automatically assigned a
datatype (such as string, integer) and a role (dimension or
measure)
Aggregation applied on measures is ‘Sum’ by default but you
can always change the default aggregation in the settings
Can you tell what’s happening in the
following cases?
B. Identifying hand-written digits in images correctly
“This is an example of Classification. The traditional
approach to solving this would be to extract digit
dependent features like curvature of different digits,
etc. and then use a classifier like SVM to distinguish
between images”
63.
What do youunderstand from Measures and Dimensions?
Each field from the data source is automatically assigned a
datatype (such as string, integer) and a role (dimension or
measure)
Aggregation applied on measures is ‘Sum’ by default but you
can always change the default aggregation in the settings
Can you tell what’s happening in the
following cases?
C. Behavior of a website indicating that the site is not working
as designed
“This is an example of Anomaly Detection. In this case,
the algorithm learns what is "normal" and what is "not
normal", usually by observing the logs of the website”
64.
What do youunderstand from Measures and Dimensions?
Each field from the data source is automatically assigned a
datatype (such as string, integer) and a role (dimension or
measure)
Aggregation applied on measures is ‘Sum’ by default but you
can always change the default aggregation in the settings
Can you tell what’s happening in the
following cases?
D. Predicting salary of an individual based his/her years
of experience
“This is an example of Regression. This problem can
be mathematically defined as a function between
independent (years of experience) and dependent
variable (salary of an individual)”
65.
Summary
What is K-MeansElbow Method to choose K Clustering cars with K-means
Classifying tumor with logisticWhat is logistic regression