Catalit LLC
SCIKIT-LEARNTUTORIAL
Francesco Mosconi
SF Data Mining Meetup @ Google Launchpad
May 2017
Data Weekends
Catalit LLC
BEFORE WE START
Download and install:
MINICONDA PYTHON 2.7
from here:
https://conda.io/miniconda.html
Catalit LLC
INTHIS WORKSHOP
• Recognize problems & choose right ML technique
• Load and manipulate data with Pandas
• Build classification model with Scikit-Learn
• Evaluate model performance with Scikit-Learn
Catalit LLC
MLTECHNIQUES
Catalit LLC
Catalit LLC
Catalit LLC
Catalit LLC
MLTECHNIQUES
CONTINUOUS CATEGORICAL
SUPERVISED REGRESSION CLASSIFICATION
UNSUPERVISED CLUSTERING
Catalit LLC
TYPES OF PROBLEMS
Catalit LLC
TYPES OF PROBLEMSSentiment Analysis Heart MonitoringBook recommendation
Caption generation
Human recognition
Catalit LLC
TYPES OF PROBLEMS
House price prediction
Document classification Social Network Analysis
Catalit LLC
SCIKIT-LEARN
Catalit LLC
MODEL BUILDING
1.
Collection
2.
Processing
3. Model
Building
4.
Evaluation
5.
Deployment
Catalit LLC
BENCHMARK
Catalit LLC
CLASSIFIERS
http://www.aboutdm.com/2013/04/history-of-machine-learning.html
Catalit LLC
Catalit LLC
Catalit LLC
New!
Catalit LLC
Catalit LLC
Catalit LLC
Catalit LLC
Catalit LLC
PROCESSING
1.
Collection
2.
Processing
3. Model
Building
4.
Evaluation
5.
Deployment
Catalit LLC
Catalit LLC
Catalit LLC
Catalit LLC
Transfor
mer
X
Transfor
mer
X'
Estimato
r
X'' y
Catalit LLC
EVALUATION
1.
Collection
2.
Processing
3. Model
Building
4.
Evaluation
5.
Deployment
Catalit LLC
Catalit LLC
CONFUSION MATRIX
• Accuracy: Overall, how often is it correct?
• (TP +TN) / total
Test Negative Test Positive
Condition
Negative
TRUE NEGATIVE
FALSE POSITIVE
(Type I error)
Condition
Positive
FALSE NEGATIVE
(Type II error)
TRUE POSITIVE
Catalit LLC
TRAIN -TEST SPLIT
Training
data
Testing
data
Model
Train
Model
Measure
performance
Alldataavailable
Catalit LLC
Catalit LLC
Catalit LLC
ATALE OF FLOWERS
https://en.wikipedia.org/wiki/Iris_flower_data_set
Iris
Versicolor
Iris
Virginica
Catalit LLC
BINARY CLASSIFICATION
Sepal Length Sepal Width Petal Length Petal Width Type
Flower 1 6.2 3.4 5.4 2.3 Virginica
Flower 2 5.9 3.0 5.1 1.8 Virginica
Flower 3 7.0 3.2 4.7 1.4 Versicolor
Features Labels
Data Point
Catalit LLC
SUPERVISED LEARNING
http://www.realsafety.org/wp-content/uploads/2014/11/safety-supervisors-interaction.png
Catalit LLC
TUTORIAL
Code:
dataweekends.com/ml
Catalit LLC
THANKYOU
Data Weekends
Next Data Weekends Dates:
2-day Machine Learning: May 6-7
2-day Intro Deep Learning: May 20 - 21
2-day Advanced Deep Learning: Jun 3 - 4
2-day Intro Deep Learning: Jun 17 - 18

Intro to scikit learn may 2017