This document provides an introduction to machine learning, including definitions, types, and case studies. It begins with an agenda and overview of artificial intelligence applications. It then defines machine learning as a field that allows computers to learn without being explicitly programmed. The main types of machine learning are described as supervised, unsupervised, semi-supervised, and reinforcement learning. Example case studies on Netflix recommendations, cancer diagnosis, and Amazon inventory are outlined. The document concludes with tips on prerequisites and resources for studying machine learning, including mathematics, programming tools, and course recommendations.
The Big DataEra
Data
• Large volumes
of data is
produced
everyday.
• Everyone has a
phone packed
with several
sensors.
Infrastructure
• The computing
power of GPUs
has increased
dramatically.
• Cloud providers
offer online
computing
(IaaS).
Services
• User
applications:
YouTube, Gmail,
Facebook,
Twitter.
• Online storage
available for
free or low cost.
7.
Notable AI Achievements
ImageNetis a database
of 14 million images with
over 20,000 categories.
GPT-3 is a language
model with 175 billion
learning parameters.
Overlapping AI RelatedTerminology
• Artificial Intelligence (AI)
Trying to simulate human intelligence.
• Machine Learning (ML)
Learn by example from experience and historic
data.
• Deep Learning (DL)
Learn patterns using multi-layered data
processors.
• Data Science (DS)
Uses a variety of scientific methods, processes
and systems to solve problems involving data.
• Big Data
Analyze data sets that are too large or complex.
Artificial
Intelligence
Machine
Learning Data
Science
Deep
Learning
Big Data
What is MachineLearning
The subfield of computer science that “gives computers the ability to
learn without being explicitly programmed”.(Arthur Samuel, 1959)
Using previous data for answering future questions
Historic may contain answers or may not contain answers
Training Prediction
Labeled Unlabeled
12.
Machine Learning vs.Traditional Programming
Traditional
Programming
Machine
Learning
Data
Rules
Answers
Data
Answers
Rules
Traditional Programming:
• Business requirements and data are analyzed.
• A set of hard-coded rules are programmed and
tested.
• Program process new data based on the coded
rules.
Machine Learning:
• Data and their labels (answers) are fed into a
model.
• Model “learns” useful features and frequent
patterns to predict answers.
• Trained model is used to “predict” answers for
new data.
13.
• Traditional Approach:
Price= 1.2 x Area + 0.7 x # Bedrooms +
0.3 x # Bathrooms
Pricing formula is known
beforehand and is explicitly hard-
coded. The formula can be
deducted by manual analysis or
SME domain experience.
• Machine Learning:
Price = A x Area + B x # Bedrooms + C
x # Bathrooms
Pricing Formula is unknown at
the beginning and would need
the model to be trained to
“Learn” the formula attributes
A,B and C.
HOUSING PRICES
Estimate housing prices based on 3 features (properties):
Area of the House, Number of Bedrooms, Number of Bathrooms
14.
Housing Prices
Area #Bedrooms#Bathrooms Price
130 3 1 1,200
160 3 2 1,500
90 2 1 900
…
Model
Hyperp
aramet
ers
Optimiz
ation
Price = A x Area + B x # Bedrooms
+ C x # Bathrooms
A = 1.247
B = 0.682
C = 0.319
Input Dataset:
Contains prepared
historic data of
actual house sales.
Model:
Model Learns appropriate
“Parameters” to “Fit” the
input data.
Output:
Parameters that completes the
formula and can be generalized to
predict unsold houses.
Supervised Learning
• Learnthrough examples collected from historic data.
• Examples contain the desired output (labels) that
will be predicted for future data.
• Is this a cat or a dog?
• Is this email a spam or not?
• What is the market value of a house given its area and
number of bedrooms?
Supervised
Unsupervised
Semi-
Supervised
Reinforcement
18.
Supervised Learning
Output iscontinuous. Predicts
numerical values such as prices or
temperature.
Supervised
Unsupervised
Semi-
Supervised
Reinforcement
Regression
Classification
Output is discrete. Predicts
categorical labels such as: Cat or
Dog.
19.
Unsupervised Learning
• Usinghistoric data that has no labels.
• Discovers the intrinsic links of data.
• Group photos into 20 groups based on their metadata.
• Segment customer profiles based on their demographics
and purchase behavior.
• Find an anomaly in credit card usage patterns.
Supervised
Unsupervised
Semi-
Supervised
Reinforcement
20.
Unsupervised Learning
• Usefulfor learning structure in the data (clustering)
or detecting outliers (anomaly).
Supervised
Unsupervised
Semi-
Supervised
Reinforcement Anomaly
21.
Semi-Supervised Learning
• Historicdata has a small amount of labeled data,
and a large amount of unlabeled data.
• The cost of manually labeling all data is prohibitive.
• The problem is initially treated as Unsupervised to group
data to different structure.
• After that, available labels are used to label entire
clusters.
Supervised
Unsupervised
Semi-
Supervised
Reinforcement
22.
Reinforcement Learning
• Anagent interacts with an environment and watches
the result of the interaction.
• Environment gives feedback via a positive or
negative reward signal.
• The agent learns to optimize its interactions to
maximize the reward.
• An autonomous vehicle learns to put safety first, minimize
ride time, and obey the rules of law.
• An stock trading agent can decide to buy, sell or hold
based on market status and transactions profit/loss.
Supervised
Unsupervised
Semi-
Supervised
Reinforcement
ML Productizing
AI-First AI-Inside
Actionable
Insights
AItech is at the center and
is essential to the product
function. Examples: Virtual
assistants, Chatbots, self-
driving cars.
AI adds a useful function
that enhances user
experience. Example:
Recommendation engines,
process automation, Fraud
detection
AI leveraging data that you
collect to make informed
decisions.
Examples: Sales forecast,
Churn analysis
25.
Case Study: NetflixRecommendation
Personalized recommendation using
Collaborative filtering
Scale:
• Volume: 13,612 titles (2019)
• Subs: 159 million (2020)
Results:
• High engagement rate, Low churn
• Personalization and recommendations
save Netflix more than $1Billion per
year.
26.
Case Study: InfervisionCancer Diagnosis
Predominantly used in early-stage lung
cancer screening. Employs more than 50
deep learning algorithms to determine
each diagnosis
Scale:
• Trained using over 200,000 scans in
trials at 20 hospitals.
Results:
• Helped reduce the rate of missed cancer
diagnoses by 50 percent.
27.
Case Study: AmazonInventory Optimization
ML-powered inventory optimization
ensures that inventory preemptively
caters for forecast demands.
Scale:
• ship an average of 10 million packages
per day.
Results:
• Store 40% more inventory.
• Fulfill 1 and 2 days shipping on time.
Mathematics Study tips
•Probability Book
A First Course In Probability 9th ed.
• Mathematics for Machine Learning
3Blue1Brown
• Coursera: Mathematics for Machine Learning, by Imperial College of London
https://www.coursera.org/specializations/mathematics-machine-learning
• Book: Mathematics for Machine Learning
https://mml-book.github.io/
31.
• Coursera: MachineLearning, offered by Stanford
https://www.coursera.org/learn/machine-learning
• YouTube: Stanford CS 229 – Machine Learning (Math focused)
https://www.youtube.com/playlist?list=PLoROMvodv4rMiGQp3WXShtMGgzqpfVfbU
• Book: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd
Edition
https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/
Machine Learning Study tips