From the course: Machine Learning with Python: k-Means Clustering
Unlock the full course today
Join today to access over 24,900 courses taught by industry experts.
Choosing the right number of clusters - Python Tutorial
From the course: Machine Learning with Python: k-Means Clustering
Choosing the right number of clusters
- [Instructor] K-means clustering is a simple and straightforward clustering technique. However, as an unsupervised machine learning approach we cannot simply evaluate the accuracy of our clusters using an existing set of labels. There are no ground truth labels. This also means that we don't always know if the value we choose for K is the most appropriate value for the data we have. There are several common approaches to deal with the challenge of choosing the right K. One approach is to use a priori or domain knowledge. With this approach, we use our prior knowledge of the expected number of clusters to inform our choice of K. This could be based on existing business requirements or other known constraints. In the absence of prior knowledge, a simple rule of thumb can also be used to choose a value of K. One such rule is setting K to the square root of half the number of observations in the dataset. As you can…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.