Overview of
Machine Learning
Hands-on Learning Experience, Prepares you for a
career in machine learning for your dream job.
Introductions
● Introduce yourself
○ Name
○ Education
● Share your expectation from this course
● If I am not afraid of failing I will…...
A little about Venkat
● Masters in Com Sci & MBA
● Co-founded two different startups and successfully exited.
● Funded couple of startups that have raised Series-A funding
● Founded BI Engines (a BI Company)
● Currently in a early stages of IOT:ML product
What Can You Expect?
The workshop is meant to provide you with a base to build your machine learning
skills. In particular you will learn to:
● Recognize problems that can be solved with Machine Learning
● Select the right technique (is it a classification problem? a regression? needs
preprocessing?)
● Load and manipulate data with Panda
● Visualize and explore data with Seaborn
● Build regression models with Scikit-Learn
● Evaluate model performance with Scikit-Learn
● Solve one kaggle project.
What is Machine Learning?
● Machine learning is the art / science of programming computer so that they can learn
from data
● Tom M. Mitchell provided a widely quoted,: "A computer program is said to learn from
experience E with respect to some class of tasks T and performance measure P if its
performance at tasks in T, as measured by P, improves with experience E."
● Due to the availability of large amounts of data (Big data), Machine learning has gained
much importance in making data driven decisions, rather than hard coded responses.
Where is Machine Learning Used?
Examples of Successful Machine Learning
● Spam filters….
● The heavily hyped, self-driving Google car? The essence of machine learning.
● Online recommendation offers such as those from Amazon and Netflix?
Machine learning applications for everyday life.
● Knowing what customers are saying about you on Twitter? Machine learning
combined with linguistic rule creation.
● Fraud detection? One of the more obvious, important uses in our world today.
What is Needed to Learn ML?
● Computer science fundamentals
○ Data structures (stacks, queues, trees, graphs, etc.)
○ Algorithms (searching, sorting, optimization etc.)
○ Computability and complexity (Big-O notation).
● Probability and Statistics
○ Probability (conditional probability, Bayes rule, likelihood, independence, etc.).
○ Statistics (uniform, normal, binomial, poisson, etc.)
○ Analysis methods (Hypothesis testing, ANOVA, etc.)
○ College level Calculus and Linear algebra
○ Cheat sheets: Calculus, Linear Algebra and Statistics
● General Background
○ An inquisitive mind
○ Desire to learn something new
Environment Setup
● Good Computer with Internet connection (Windows, Mac, or Linux)
● Installation of Conda and ML Workshop Files using this file.
End-to-End Supervised Machine Learning
Frame The
Problem
Analyze
Data
Feature
Engineering
Model
Selection
Tune the
Model
Predict on
new Cases
Obtain
Data
Machine Learning : What is Great For?
● Where existing problems require a lot hand tuning or lot of rules
○ ML can simplify code and perform better
● Complex problem for which there is no good solution
○ ML Techniques can find a solution
● Fluctuating environment
○ ML can adapt to change in data
● Getting insights about complex problems
○ ML can scan huge data problems
Types of Machine Learning Systems
● Whether or not they are trained with human supervision
○ Supervised, UnSupervised, SemiSupervised, and Reinforcement Learning
● Whether or not they can learn incrementally
○ Batch versus Online/Incrementally
● Comparison of existing data with new data, or detect pattern using training
data
○ Instance based vs model-based training
WorkFlow - Supervised
Dataset
Training
Dataset
Test
Dataset
Train the
Model
Test the
Model
Verification
Dataset
Deploy
Model
Model Selection &
Feature Engineering
Supervised Learning
In supervised learning the training data you feed to the algorithm includes the
desired solution called labels.
Some examples of supervised learning
● Classification: Here the label/target is one of given set of values. Spam
filtering is a good example of this.
● Regression: When target is a numeric value, and it is continuous in nature
(such as car price), then given a set of features (mileage, brand, etc) called
predictors to predict the target.
Supervised Learning Algorithms
● k-Nearest-Neighbors
● Linear Regression
● Logistic Regression
● Support Vector Machines (SVMs)
● Decision Trees and Random Forests
● Neural Networks
Supervised Learning
ID X1 X2 X3 X4 X5 X6 X& X8 X9 X10 X11 X12 Target
Supervised Learning
ID X1 X2 X3 X4 X5 X6 X& X8 X9 X10 X11 X12 Target
Features
Supervised Learning
ID Features Target
Features
Supervised Learning
ID Features Target
Supervised Learning
ID Features Target
Training
Supervised Learning
ID Features Target
Training
Test
Supervised Learning
ID Features Target
X_Train y_T
Rai
n
X_Test y_Te
st
Training
Test
Some Basic Math
ID Features Target
Some Basic Math
Target = Function ( Features )
Some Basic Math
Target = Fn ( X1, X2, X3 - -- - X12 )
Example of a Linear Function
Target = C0 + C1*X1 + C2* X2 + C3*X3 + - -- - + C12*X12
Machine Learning
● Apply Training set to estimates (C0, C1, C2 …. C12)
●
WorkFlow - Supervised
Dataset
Training
Dataset
Test
Dataset
Train the
Model
Test the
Model
Validation
Dataset
Deploy
Model
Model Selection &
Feature Engineering
Supervised Learning
ID Features Target
Predicted Actual
Predicted Actual
Predicted Actual
Predicted
NO YES
Actual
NO TN FP
YES FN TP
Confusion Matrix
Predicted
NO YES
Actual
NO TN FP
YES FN TP
Precision Score = TP / (TP + FP )
Recall Score = tp / (tp + fn)
F1 = 2 * (precision * recall) / (precision + recall)
Unsupervised Learning
The training data is unlabeled. The system tries to learn without a teacher.
Example is “Blog visitors categorised by some features”. Some algorithms are:
● Clustering
○ k-Means
○ Hierarchical Cluster Analysis (HCA)
○ Expectation Maximization
● Visualization and Dimensionality Reduction
○ Principal Component Analysis
○ Kernel PCA
○ Locally-Linear Embedding (LLE)
○ t-distributed Stochastic Neighbor Embedding (t-SNE)
● Association Rule Learning
○ Apriori
○ Eclat
Reinforced Learning
● RL is a complete different beast
● The learning system, called in an agent, can observe an environment, select
and perform actions, and get rewards in return. It must then learn by itself
what is the best strategy, called a policy, to maximize the reward over time.
Batch versus Online Learning
● In batch learning the system is incapable of learning incrementally. It must be
trained using all available data.
○ Suggest some examples
● Online / Incremental Learning. In this system you train the system
incrementally be feeding data instances sequentially. Either individually or by
small groups called mini-batches.
○ Suggest some examples
○ How fast the system can learn is called the learning rate.
Instance Based VS Model-Based Learning
Another way to categorize machine learning systems is by how they generalize
● Instance-based Learning: Learns the examples by heart and then
generalizes to new cases using a similarity measure.
● Model-based Learning: From a set of examples is to build a model of these
examples, then use that model to make predictions.
Main Challenges of Machine Learning
● Insufficient Quantity of training data
● Non Representative of training data
● Poor-Quality data
● Irrelevant Features
○ Critical part of the success Machine Learning project is coming up with a good set of features
to train on. This process is called Feature Engineering.
■ Feature Selection: selecting the most useful features to train on among existing
features.
■ Feature Extraction: Combining existing features to produce a more useful one.
● Overfitting the training data
○ Model performs well on training data but does not generalize well.
○ Constraining the model to make it simpler and reduce the risk of overfitting is called
regularization.

Overview of machine learning

  • 1.
    Overview of Machine Learning Hands-onLearning Experience, Prepares you for a career in machine learning for your dream job.
  • 2.
    Introductions ● Introduce yourself ○Name ○ Education ● Share your expectation from this course ● If I am not afraid of failing I will…...
  • 3.
    A little aboutVenkat ● Masters in Com Sci & MBA ● Co-founded two different startups and successfully exited. ● Funded couple of startups that have raised Series-A funding ● Founded BI Engines (a BI Company) ● Currently in a early stages of IOT:ML product
  • 4.
    What Can YouExpect? The workshop is meant to provide you with a base to build your machine learning skills. In particular you will learn to: ● Recognize problems that can be solved with Machine Learning ● Select the right technique (is it a classification problem? a regression? needs preprocessing?) ● Load and manipulate data with Panda ● Visualize and explore data with Seaborn ● Build regression models with Scikit-Learn ● Evaluate model performance with Scikit-Learn ● Solve one kaggle project.
  • 5.
    What is MachineLearning? ● Machine learning is the art / science of programming computer so that they can learn from data ● Tom M. Mitchell provided a widely quoted,: "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E." ● Due to the availability of large amounts of data (Big data), Machine learning has gained much importance in making data driven decisions, rather than hard coded responses.
  • 6.
    Where is MachineLearning Used?
  • 7.
    Examples of SuccessfulMachine Learning ● Spam filters…. ● The heavily hyped, self-driving Google car? The essence of machine learning. ● Online recommendation offers such as those from Amazon and Netflix? Machine learning applications for everyday life. ● Knowing what customers are saying about you on Twitter? Machine learning combined with linguistic rule creation. ● Fraud detection? One of the more obvious, important uses in our world today.
  • 8.
    What is Neededto Learn ML? ● Computer science fundamentals ○ Data structures (stacks, queues, trees, graphs, etc.) ○ Algorithms (searching, sorting, optimization etc.) ○ Computability and complexity (Big-O notation). ● Probability and Statistics ○ Probability (conditional probability, Bayes rule, likelihood, independence, etc.). ○ Statistics (uniform, normal, binomial, poisson, etc.) ○ Analysis methods (Hypothesis testing, ANOVA, etc.) ○ College level Calculus and Linear algebra ○ Cheat sheets: Calculus, Linear Algebra and Statistics ● General Background ○ An inquisitive mind ○ Desire to learn something new
  • 9.
    Environment Setup ● GoodComputer with Internet connection (Windows, Mac, or Linux) ● Installation of Conda and ML Workshop Files using this file.
  • 10.
    End-to-End Supervised MachineLearning Frame The Problem Analyze Data Feature Engineering Model Selection Tune the Model Predict on new Cases Obtain Data
  • 11.
    Machine Learning :What is Great For? ● Where existing problems require a lot hand tuning or lot of rules ○ ML can simplify code and perform better ● Complex problem for which there is no good solution ○ ML Techniques can find a solution ● Fluctuating environment ○ ML can adapt to change in data ● Getting insights about complex problems ○ ML can scan huge data problems
  • 12.
    Types of MachineLearning Systems ● Whether or not they are trained with human supervision ○ Supervised, UnSupervised, SemiSupervised, and Reinforcement Learning ● Whether or not they can learn incrementally ○ Batch versus Online/Incrementally ● Comparison of existing data with new data, or detect pattern using training data ○ Instance based vs model-based training
  • 13.
    WorkFlow - Supervised Dataset Training Dataset Test Dataset Trainthe Model Test the Model Verification Dataset Deploy Model Model Selection & Feature Engineering
  • 14.
    Supervised Learning In supervisedlearning the training data you feed to the algorithm includes the desired solution called labels. Some examples of supervised learning ● Classification: Here the label/target is one of given set of values. Spam filtering is a good example of this. ● Regression: When target is a numeric value, and it is continuous in nature (such as car price), then given a set of features (mileage, brand, etc) called predictors to predict the target.
  • 15.
    Supervised Learning Algorithms ●k-Nearest-Neighbors ● Linear Regression ● Logistic Regression ● Support Vector Machines (SVMs) ● Decision Trees and Random Forests ● Neural Networks
  • 16.
    Supervised Learning ID X1X2 X3 X4 X5 X6 X& X8 X9 X10 X11 X12 Target
  • 17.
    Supervised Learning ID X1X2 X3 X4 X5 X6 X& X8 X9 X10 X11 X12 Target Features
  • 18.
  • 19.
  • 20.
  • 21.
    Supervised Learning ID FeaturesTarget Training Test
  • 22.
    Supervised Learning ID FeaturesTarget X_Train y_T Rai n X_Test y_Te st Training Test
  • 23.
    Some Basic Math IDFeatures Target
  • 24.
    Some Basic Math Target= Function ( Features )
  • 25.
    Some Basic Math Target= Fn ( X1, X2, X3 - -- - X12 )
  • 26.
    Example of aLinear Function Target = C0 + C1*X1 + C2* X2 + C3*X3 + - -- - + C12*X12
  • 27.
    Machine Learning ● ApplyTraining set to estimates (C0, C1, C2 …. C12) ●
  • 28.
    WorkFlow - Supervised Dataset Training Dataset Test Dataset Trainthe Model Test the Model Validation Dataset Deploy Model Model Selection & Feature Engineering
  • 29.
    Supervised Learning ID FeaturesTarget Predicted Actual Predicted Actual Predicted Actual Predicted NO YES Actual NO TN FP YES FN TP
  • 30.
    Confusion Matrix Predicted NO YES Actual NOTN FP YES FN TP Precision Score = TP / (TP + FP ) Recall Score = tp / (tp + fn) F1 = 2 * (precision * recall) / (precision + recall)
  • 31.
    Unsupervised Learning The trainingdata is unlabeled. The system tries to learn without a teacher. Example is “Blog visitors categorised by some features”. Some algorithms are: ● Clustering ○ k-Means ○ Hierarchical Cluster Analysis (HCA) ○ Expectation Maximization ● Visualization and Dimensionality Reduction ○ Principal Component Analysis ○ Kernel PCA ○ Locally-Linear Embedding (LLE) ○ t-distributed Stochastic Neighbor Embedding (t-SNE) ● Association Rule Learning ○ Apriori ○ Eclat
  • 32.
    Reinforced Learning ● RLis a complete different beast ● The learning system, called in an agent, can observe an environment, select and perform actions, and get rewards in return. It must then learn by itself what is the best strategy, called a policy, to maximize the reward over time.
  • 33.
    Batch versus OnlineLearning ● In batch learning the system is incapable of learning incrementally. It must be trained using all available data. ○ Suggest some examples ● Online / Incremental Learning. In this system you train the system incrementally be feeding data instances sequentially. Either individually or by small groups called mini-batches. ○ Suggest some examples ○ How fast the system can learn is called the learning rate.
  • 34.
    Instance Based VSModel-Based Learning Another way to categorize machine learning systems is by how they generalize ● Instance-based Learning: Learns the examples by heart and then generalizes to new cases using a similarity measure. ● Model-based Learning: From a set of examples is to build a model of these examples, then use that model to make predictions.
  • 35.
    Main Challenges ofMachine Learning ● Insufficient Quantity of training data ● Non Representative of training data ● Poor-Quality data ● Irrelevant Features ○ Critical part of the success Machine Learning project is coming up with a good set of features to train on. This process is called Feature Engineering. ■ Feature Selection: selecting the most useful features to train on among existing features. ■ Feature Extraction: Combining existing features to produce a more useful one. ● Overfitting the training data ○ Model performs well on training data but does not generalize well. ○ Constraining the model to make it simpler and reduce the risk of overfitting is called regularization.