Artificial
Intelligence &
Machine Learning
Concepts
Data Science
Society of Auburn
November 16, 2020
Dan O’Leary
dan.oleary@auburn.edu
Introduction
• Introduction to AI/ML concepts and Data Science
• Based on a lecture for undergrads in BET 2019
• No expectation of related knowledge
• Expanded and deepened for this audience
• Still “Big Picture”
• Provide some context, detail for those of you in deeper studies
• Introduce those of you new / interested
• Disclaimers
• Some references a little dated (yes, in 18 months)
• Not as well cited as it should be
• I am not an expert!
DATA SCIENCE SOCIETY
OF AUBURN
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 2
Topics
• Attempt to define Artificial Intelligence (fail)
• Attempt to describe the field of Data Science (better)
• Introduce fundamental concept of Machine Learning (pretty good!?)
• Describe state of the art methods in 1 slide each (jury’s out)
• Use cases and examples (great!)
• Discuss why now, benefits, and limitations / pitfalls (ok)
Only the tip of the tip of the iceberg
Not a formal academic presentation
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 3
Background
• 1992 – BS Mechanical Engineering
• 2019 – Master of Engineering Management, Supervised Learning
• 2020 – Grad Cert Modeling and Analytics for Operations (ISE)
• ≈2022 – PhD, Industrial and Systems Engineering
• Modeling / simulation, data science / machine learning
• Teach undergrad courses in innovation and product development
• Life-long fascination with modeling and simulation of all types
• 2 x Entrepreneur: co-Founded n-Space in 1994, funding from Sony
• 23 years, 45 games – concept to completion; major brands,
publishers, and platforms; all genres and demographics
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 4
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 5
From GTC 2018 – GPU Tech Conference
https://youtu.be/GiZ7kyrwZGQ
Embedded videos have been replaced with
screenshots and youtube links.
Question: What is Artificial Intelligence (AI)?
If you can’t explain it to a six-year-old, you don’t
understand it yourself.
- Albert Einstein
Answer: I don’t know.
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 6
What is AI?
• Imprecise, overused term
• Calculator?
• Self-driving car?
• Chatbot?
• Definition is fuzzy, changes over time
• Old, diluted, hyped term – backlash, cynicism
• Generally used to describe machines doing tasks traditionally
assigned to humans
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 7
Classic Definition of AI
An intelligently designed agent that perceives its
environment and makes decisions to maximize the
chances of achieving its goal.
• Subfields:
• Computer Vision
• Robotics / Control Theory
• Natural Language Processing
https://medium.com/machine-learning-for-humans/why-machine-learning-matters-6164faf1df12
Now considered “Machine Learning”
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 8
AI Effect
Once a machine takes over a task, humans tend to dismiss it as “not AI”
It’s part of the history of the field of artificial intelligence that every
time somebody figured out how to make a computer do something —
play good checkers, solve simple but relatively informal problems —
there was chorus of critics to say, ‘that’s not thinking’
Pamela McCorduck, 2004
AI is whatever hasn't been done yet. – Douglas Hofstadter
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 9
AI Effect
• 1997 – IBM’s Deep Blue beats world chess
champion Garry Kasparov
• 259th most powerful supercomputer at the time
• Planned 6-8 moves out; as high as 20+
• “Brute force methods… not real intelligence”
• Changed the canonical example of human
vs machines from Chess to Go
• Simple rules, many more possible moves
• More intuition, less susceptible to brute force
• 2016 Google DeepMind beats Go Champ
• 2019 DeepMind beats StarCraft II Pros
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 10
David Robinson: http://varianceexplained.org/r/ds-ml-ai/
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 11
Data Science
• Artificial Intelligence
• Big Data
• Statistical Learning
• Predictive Analytics
• Data Mining
• Machine Learning
• Pattern Recognition
• Deep Learning
Terms, usage, and interpretation vary
Overwhelmingly expansive and fast-moving field
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 12
PracticeTheory
MATH
STATS
ENG
SCI
BUSINESS
Excel
TableauMinitab
MATLAB
Python
R
Dan’s Crude Model of Domains and Tools v0.01
Broad, Multi/Interdisciplinary Interest
Figures Not
To Scale!
Source: I have only myself to blame for this slide.
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 13
The “Classic” Definition of Data Science
drewconway.com/zia/2013/3/26/the-data-science-
venn-diagram
Robinson, Emily, and Jacqueline Nolis. Build a Career in Data Science.
Simon and Schuster, 2020.
2010
2020
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 14
Data Science in Broad Terms
• Components (Skills)
• Math and Stats – methods related to data literacy
• Programming & Databases – coding, engineering, carpentry
• Domain Knowledge – subject matter expertise
• Applications (Jobs)
• Analytics – create dashboards and reports that deliver data
• Machine Learning – creates models that run continuously
• Decision Science – creates analyses that create recommendations
Robinson, Emily, and Jacqueline Nolis. Build a Career in Data Science. Simon and Schuster, 2020.
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 15
David Robinson: http://varianceexplained.org/r/ds-ml-ai/
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 16
Outcome-based…
• Data Science produces insights
• Various types of insight – descriptive, exploratory, causal
• Statistical inference, data visualization, and experiment design
• Machine Learning produces predictions
• Various types of predictions – regression, classification
• Artificial Intelligence produces actions
• Executed or recommended by autonomous agents
• Includes game-playing, robotics / control theory, optimization, NLP, RL
David Robinson: http://varianceexplained.org/r/ds-ml-ai/
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 17
Example: Self-Driving Car
• Machine Learning
• Object recognition model trained using many photos of streetside objects
• System predicts the presence of stop signs
• Artificial Intelligence
• Given varying road conditions and presence of a stop sign
• Autonomous agent decides when / how to act, properly applying the brakes
• Data Science
• Analyzing test data developers gain insight about the cause of false negatives
• They generate a report summarizing their findings / recommendations
David Robinson: http://varianceexplained.org/r/ds-ml-ai/
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 18
Another Angle – Archetypes
• Four Components of Data Science
• Analysis – insights
• Modeling – prediction
• Engineering – deployment
• Mechanics – cleaning / prep*
• Five Archetypes of Data Scientists
• Generalist – proficient at everything
• Detective – master of analysis
• Oracle – master of modeling
• Maker – master of engineering
https://e2eml.school/data_science_archetypes.html
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 19
Not Shown
• Few all-around masters!
• Everyone cleans data!
https://e2eml.school/data_science_archetypes.html
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 20
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 21
What is AI?
• I’m still not sure…
• Let’s go with this:
• It includes ML and DL
• Actions → AI
• Predictions → ML
• Usually
• Ignore grander AI visions,
claims, speculation
Most of what we think of as “AI” today based on Deep Learning methods
Much of AI’s imagined potential remains distant
Machine Learning & Deep Learning are very real, here now, everywhere
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 22
Machine Learning
• Gives “computers the ability to learn without being explicitly
programmed.” – Arthur Samuel, 1959
• Identify patterns in observed data
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 23
Unsupervised Learning
https://towardsdatascience.com/what-is-machine-learning-a-short-note-on-supervised-unsupervised-semi-supervised-and-aed1573ae9bb
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 24
Supervised Learning
https://towardsdatascience.com/what-is-machine-learning-a-short-note-on-supervised-unsupervised-semi-supervised-and-aed1573ae9bb
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 25
Machine Learning
• Linear Regression to Deep Neural Nets
• Ingredient technology
• “Macroscope” (inverted microscope) – sees things too big to view
• Deep Neural Nets with tens of millions of parameters
• Image data sets on the order of 1M x 1M+, video much larger
• Entire USPTO archive (text and images), over 4M patents to 1976
• Many data sets much larger
• Learns by finding statistical structure in training examples
• Meaningful transformation / representations of data
• Largely empirical methods
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 26
Transformations / Representations of Data
Example: Classification
Chollet, François. Deep Learning with Python. Manning Publications Company, 2017.
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 27
Lots of ways to do it…
Best method? It depends.
• Bayesian
• Decision Tree
• Dimens. Reduction
• Instance Based
• Clustering
• Regression
• Rule System
• Regularization
• Neural Networks
• Ensemble
• Deep Learning
SVM?!
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 28
“State of the Art”
For Kaggle Contests, at least
• Gradient Boosting
• LightGBM, XGBoost
• For structured data
• Python or R
• Deep Learning
• Keras/TF, Fastai/PT
• For perceptual problems
• Python
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 29
Gradient Boosting in 1 Slide
• Series of decision trees*
• Each improved by prior
• Weights adjusted based on
ease of classification
• Repeat and combine results
https://datascience.eu/machine-learning/gradient-boosting-what-you-need-to-know/
*Decision trees can be thought of as giant “if-then” structures converting inputs to outputs based on features
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 30
Deep Learning in 1 Slide
Bonus: Neural Network!
Neural
Network
>1 → “Deep”
𝑓(𝒘 $ 𝒙 + 𝑏)
𝑥!
𝑦
𝑥"
𝑥#
𝑥$
𝑥%
𝑥&
𝑓($) is the activation function
Non-linear: sigmoid, tanh, etc.
http://neuralnetworksanddeeplearning.com/index.html Chollet, François. Deep Learning with Python. Manning
Publications Company, 2017.
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 31
Common Current ML/DL Use Cases
• Natural Language Processing
• Google Translate, Siri/Cortana/Alexa, Auto-correct
• Recommendation Systems
• Netflix, Amazon, Facebook
• Customer Relationship Management
• Direct marketing, mobile advertising, chatbots
• Finance
• Credit score, loan approval, fraud detection ($100B-$1T), algorithmic trading
• Image Recognition
• Pose detection, facial recognition, medical image processing
Tip of the Iceberg – ML/DL is Everywhere!
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 32
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 33
https://youtu.be/ayPqjPekn7g
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 34
https://youtu.be/0FW99AQmMc8
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 35
Trains only using the score – reinforcement learning
https://youtu.be/TmPfTpjtdgg
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 36
https://youtu.be/kSLJriaOumA
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 37
https://youtu.be/LBd5FZqhUVk
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 38
https://youtu.be/0jcigK65mpc
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 39
https://youtu.be/DjERMBnvTEE
Why Now?
• 50+ years of research
• Algorithm / SW dev
• Huge Investments
• Democratization
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 40
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 41
Why Now?
• Compute power
• GPU – graphics processing unit
• Originally developed for 3D graphics
• Massively parallel matrix operations
• Orders of magnitude better performance
• Deeper Blue (1997)
• 11.38 GFLOPS
• ~ $100M
• NVIDIA GTX 1080 (2016)
• 8,873 GFLOPS
• $499 MSRP
• 150 million times more GF / $
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 42
Why Now?
• Access to data
• 175 zettabytes annually by 2025
• 1 zettabyte = 1 trillion gigabytes
• The Internet
• Infrastructure to facilitate
• Instrumentation of everything
https://www.digitalinformationworld.com/2018/06/infographics-data-never-sleeps-6.html
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 43
Benefits of Data Science
• Domain independent technology; “metascience”
• Informs decision making
• Empowers organizational learning
• Improves operational efficiency
• Leverages underutilized by-product of work
• Delivers actionable results
• Automates scientific discovery
… a user’s data can be purchased for about half a cent, but the average
user’s value to the Internet advertising ecosystem is estimated at
$1,200 per year.
Credit: Predictive Analytics, Eric Siegel, p. 54
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 44
Limitations and Pitfalls
• Accurate prediction (extrapolation) is generally not possible.
• “Prediction is very difficult, especially if it’s about the future.” – N. Bohr
• High value from relatively low predictive power; targeted optimization
• Does not answer WHY or HOW.
• Correlation does not imply causation; many models opaque, empirical
• Value comes from the prediction, not understanding cause
• Vast search / Multiple comparisons trap
• Possibility of being fooled by randomness - real trend or random artifact
• Importance of domain knowledge and disciplined research
• Bias / Variance tradeoff
• Fit vs Predictive Power
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 45
Takeaways
• Data Science is a broad, fast-moving field, with hype and confusion
• The “promise” of AI cannot be met by current or near future tech
• We are surrounded by current use cases, many more emerging
• Its recent growth is fueled by data, compute, algorithms, sw, and $$$
• Leverages existing data to improve operational efficiencies
• Identifies unexpected connections but does not explore causation
• It is not fool-proof and requires expert oversight
• Cannot be fully explained (even introduced) in one short talk…
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 46
Resources
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 47
Additional Resources for Deep Learning
• http://neuralnetworksanddeeplearning.com/index.html – free, online
only, starts with writing a simple backprop NN from scratch in Python
• https://www.manning.com/books/deep-learning-with-python - build
models in Keras and Tensorflow, written by creator of Keras, 2nd
edition coming soon!
• https://course.fast.ai – alternative to Keras built on PyTorch, all work
is done inside Jupyter Notebooks
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 48
Thank You.
Contact Information:
Dan O’Leary
dan.oleary@auburn.edu
Blog / Portfolio / Links: bit.ly/aboutdjo
11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 49
DATA SCIENCE SOCIETY
OF AUBURN

AI, Machine Learning, and Data Science Concepts

  • 1.
    Artificial Intelligence & Machine Learning Concepts DataScience Society of Auburn November 16, 2020 Dan O’Leary dan.oleary@auburn.edu
  • 2.
    Introduction • Introduction toAI/ML concepts and Data Science • Based on a lecture for undergrads in BET 2019 • No expectation of related knowledge • Expanded and deepened for this audience • Still “Big Picture” • Provide some context, detail for those of you in deeper studies • Introduce those of you new / interested • Disclaimers • Some references a little dated (yes, in 18 months) • Not as well cited as it should be • I am not an expert! DATA SCIENCE SOCIETY OF AUBURN 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 2
  • 3.
    Topics • Attempt todefine Artificial Intelligence (fail) • Attempt to describe the field of Data Science (better) • Introduce fundamental concept of Machine Learning (pretty good!?) • Describe state of the art methods in 1 slide each (jury’s out) • Use cases and examples (great!) • Discuss why now, benefits, and limitations / pitfalls (ok) Only the tip of the tip of the iceberg Not a formal academic presentation 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 3
  • 4.
    Background • 1992 –BS Mechanical Engineering • 2019 – Master of Engineering Management, Supervised Learning • 2020 – Grad Cert Modeling and Analytics for Operations (ISE) • ≈2022 – PhD, Industrial and Systems Engineering • Modeling / simulation, data science / machine learning • Teach undergrad courses in innovation and product development • Life-long fascination with modeling and simulation of all types • 2 x Entrepreneur: co-Founded n-Space in 1994, funding from Sony • 23 years, 45 games – concept to completion; major brands, publishers, and platforms; all genres and demographics 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 4
  • 5.
    11/16/20 Data ScienceSociety of Auburn // AI, ML, and DS Concepts // November 16, 2020 5 From GTC 2018 – GPU Tech Conference https://youtu.be/GiZ7kyrwZGQ Embedded videos have been replaced with screenshots and youtube links.
  • 6.
    Question: What isArtificial Intelligence (AI)? If you can’t explain it to a six-year-old, you don’t understand it yourself. - Albert Einstein Answer: I don’t know. 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 6
  • 7.
    What is AI? •Imprecise, overused term • Calculator? • Self-driving car? • Chatbot? • Definition is fuzzy, changes over time • Old, diluted, hyped term – backlash, cynicism • Generally used to describe machines doing tasks traditionally assigned to humans 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 7
  • 8.
    Classic Definition ofAI An intelligently designed agent that perceives its environment and makes decisions to maximize the chances of achieving its goal. • Subfields: • Computer Vision • Robotics / Control Theory • Natural Language Processing https://medium.com/machine-learning-for-humans/why-machine-learning-matters-6164faf1df12 Now considered “Machine Learning” 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 8
  • 9.
    AI Effect Once amachine takes over a task, humans tend to dismiss it as “not AI” It’s part of the history of the field of artificial intelligence that every time somebody figured out how to make a computer do something — play good checkers, solve simple but relatively informal problems — there was chorus of critics to say, ‘that’s not thinking’ Pamela McCorduck, 2004 AI is whatever hasn't been done yet. – Douglas Hofstadter 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 9
  • 10.
    AI Effect • 1997– IBM’s Deep Blue beats world chess champion Garry Kasparov • 259th most powerful supercomputer at the time • Planned 6-8 moves out; as high as 20+ • “Brute force methods… not real intelligence” • Changed the canonical example of human vs machines from Chess to Go • Simple rules, many more possible moves • More intuition, less susceptible to brute force • 2016 Google DeepMind beats Go Champ • 2019 DeepMind beats StarCraft II Pros 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 10
  • 11.
    David Robinson: http://varianceexplained.org/r/ds-ml-ai/ 11/16/20Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 11
  • 12.
    Data Science • ArtificialIntelligence • Big Data • Statistical Learning • Predictive Analytics • Data Mining • Machine Learning • Pattern Recognition • Deep Learning Terms, usage, and interpretation vary Overwhelmingly expansive and fast-moving field 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 12
  • 13.
    PracticeTheory MATH STATS ENG SCI BUSINESS Excel TableauMinitab MATLAB Python R Dan’s Crude Modelof Domains and Tools v0.01 Broad, Multi/Interdisciplinary Interest Figures Not To Scale! Source: I have only myself to blame for this slide. 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 13
  • 14.
    The “Classic” Definitionof Data Science drewconway.com/zia/2013/3/26/the-data-science- venn-diagram Robinson, Emily, and Jacqueline Nolis. Build a Career in Data Science. Simon and Schuster, 2020. 2010 2020 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 14
  • 15.
    Data Science inBroad Terms • Components (Skills) • Math and Stats – methods related to data literacy • Programming & Databases – coding, engineering, carpentry • Domain Knowledge – subject matter expertise • Applications (Jobs) • Analytics – create dashboards and reports that deliver data • Machine Learning – creates models that run continuously • Decision Science – creates analyses that create recommendations Robinson, Emily, and Jacqueline Nolis. Build a Career in Data Science. Simon and Schuster, 2020. 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 15
  • 16.
    David Robinson: http://varianceexplained.org/r/ds-ml-ai/ 11/16/20Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 16
  • 17.
    Outcome-based… • Data Scienceproduces insights • Various types of insight – descriptive, exploratory, causal • Statistical inference, data visualization, and experiment design • Machine Learning produces predictions • Various types of predictions – regression, classification • Artificial Intelligence produces actions • Executed or recommended by autonomous agents • Includes game-playing, robotics / control theory, optimization, NLP, RL David Robinson: http://varianceexplained.org/r/ds-ml-ai/ 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 17
  • 18.
    Example: Self-Driving Car •Machine Learning • Object recognition model trained using many photos of streetside objects • System predicts the presence of stop signs • Artificial Intelligence • Given varying road conditions and presence of a stop sign • Autonomous agent decides when / how to act, properly applying the brakes • Data Science • Analyzing test data developers gain insight about the cause of false negatives • They generate a report summarizing their findings / recommendations David Robinson: http://varianceexplained.org/r/ds-ml-ai/ 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 18
  • 19.
    Another Angle –Archetypes • Four Components of Data Science • Analysis – insights • Modeling – prediction • Engineering – deployment • Mechanics – cleaning / prep* • Five Archetypes of Data Scientists • Generalist – proficient at everything • Detective – master of analysis • Oracle – master of modeling • Maker – master of engineering https://e2eml.school/data_science_archetypes.html 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 19
  • 20.
    Not Shown • Fewall-around masters! • Everyone cleans data! https://e2eml.school/data_science_archetypes.html 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 20
  • 21.
    11/16/20 Data ScienceSociety of Auburn // AI, ML, and DS Concepts // November 16, 2020 21
  • 22.
    What is AI? •I’m still not sure… • Let’s go with this: • It includes ML and DL • Actions → AI • Predictions → ML • Usually • Ignore grander AI visions, claims, speculation Most of what we think of as “AI” today based on Deep Learning methods Much of AI’s imagined potential remains distant Machine Learning & Deep Learning are very real, here now, everywhere 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 22
  • 23.
    Machine Learning • Gives“computers the ability to learn without being explicitly programmed.” – Arthur Samuel, 1959 • Identify patterns in observed data 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 23
  • 24.
  • 25.
  • 26.
    Machine Learning • LinearRegression to Deep Neural Nets • Ingredient technology • “Macroscope” (inverted microscope) – sees things too big to view • Deep Neural Nets with tens of millions of parameters • Image data sets on the order of 1M x 1M+, video much larger • Entire USPTO archive (text and images), over 4M patents to 1976 • Many data sets much larger • Learns by finding statistical structure in training examples • Meaningful transformation / representations of data • Largely empirical methods 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 26
  • 27.
    Transformations / Representationsof Data Example: Classification Chollet, François. Deep Learning with Python. Manning Publications Company, 2017. 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 27
  • 28.
    Lots of waysto do it… Best method? It depends. • Bayesian • Decision Tree • Dimens. Reduction • Instance Based • Clustering • Regression • Rule System • Regularization • Neural Networks • Ensemble • Deep Learning SVM?! 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 28
  • 29.
    “State of theArt” For Kaggle Contests, at least • Gradient Boosting • LightGBM, XGBoost • For structured data • Python or R • Deep Learning • Keras/TF, Fastai/PT • For perceptual problems • Python 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 29
  • 30.
    Gradient Boosting in1 Slide • Series of decision trees* • Each improved by prior • Weights adjusted based on ease of classification • Repeat and combine results https://datascience.eu/machine-learning/gradient-boosting-what-you-need-to-know/ *Decision trees can be thought of as giant “if-then” structures converting inputs to outputs based on features 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 30
  • 31.
    Deep Learning in1 Slide Bonus: Neural Network! Neural Network >1 → “Deep” 𝑓(𝒘 $ 𝒙 + 𝑏) 𝑥! 𝑦 𝑥" 𝑥# 𝑥$ 𝑥% 𝑥& 𝑓($) is the activation function Non-linear: sigmoid, tanh, etc. http://neuralnetworksanddeeplearning.com/index.html Chollet, François. Deep Learning with Python. Manning Publications Company, 2017. 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 31
  • 32.
    Common Current ML/DLUse Cases • Natural Language Processing • Google Translate, Siri/Cortana/Alexa, Auto-correct • Recommendation Systems • Netflix, Amazon, Facebook • Customer Relationship Management • Direct marketing, mobile advertising, chatbots • Finance • Credit score, loan approval, fraud detection ($100B-$1T), algorithmic trading • Image Recognition • Pose detection, facial recognition, medical image processing Tip of the Iceberg – ML/DL is Everywhere! 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 32
  • 33.
    11/16/20 Data ScienceSociety of Auburn // AI, ML, and DS Concepts // November 16, 2020 33 https://youtu.be/ayPqjPekn7g
  • 34.
    11/16/20 Data ScienceSociety of Auburn // AI, ML, and DS Concepts // November 16, 2020 34 https://youtu.be/0FW99AQmMc8
  • 35.
    11/16/20 Data ScienceSociety of Auburn // AI, ML, and DS Concepts // November 16, 2020 35 Trains only using the score – reinforcement learning https://youtu.be/TmPfTpjtdgg
  • 36.
    11/16/20 Data ScienceSociety of Auburn // AI, ML, and DS Concepts // November 16, 2020 36 https://youtu.be/kSLJriaOumA
  • 37.
    11/16/20 Data ScienceSociety of Auburn // AI, ML, and DS Concepts // November 16, 2020 37 https://youtu.be/LBd5FZqhUVk
  • 38.
    11/16/20 Data ScienceSociety of Auburn // AI, ML, and DS Concepts // November 16, 2020 38 https://youtu.be/0jcigK65mpc
  • 39.
    11/16/20 Data ScienceSociety of Auburn // AI, ML, and DS Concepts // November 16, 2020 39 https://youtu.be/DjERMBnvTEE
  • 40.
    Why Now? • 50+years of research • Algorithm / SW dev • Huge Investments • Democratization 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 40
  • 41.
    11/16/20 Data ScienceSociety of Auburn // AI, ML, and DS Concepts // November 16, 2020 41
  • 42.
    Why Now? • Computepower • GPU – graphics processing unit • Originally developed for 3D graphics • Massively parallel matrix operations • Orders of magnitude better performance • Deeper Blue (1997) • 11.38 GFLOPS • ~ $100M • NVIDIA GTX 1080 (2016) • 8,873 GFLOPS • $499 MSRP • 150 million times more GF / $ 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 42
  • 43.
    Why Now? • Accessto data • 175 zettabytes annually by 2025 • 1 zettabyte = 1 trillion gigabytes • The Internet • Infrastructure to facilitate • Instrumentation of everything https://www.digitalinformationworld.com/2018/06/infographics-data-never-sleeps-6.html 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 43
  • 44.
    Benefits of DataScience • Domain independent technology; “metascience” • Informs decision making • Empowers organizational learning • Improves operational efficiency • Leverages underutilized by-product of work • Delivers actionable results • Automates scientific discovery … a user’s data can be purchased for about half a cent, but the average user’s value to the Internet advertising ecosystem is estimated at $1,200 per year. Credit: Predictive Analytics, Eric Siegel, p. 54 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 44
  • 45.
    Limitations and Pitfalls •Accurate prediction (extrapolation) is generally not possible. • “Prediction is very difficult, especially if it’s about the future.” – N. Bohr • High value from relatively low predictive power; targeted optimization • Does not answer WHY or HOW. • Correlation does not imply causation; many models opaque, empirical • Value comes from the prediction, not understanding cause • Vast search / Multiple comparisons trap • Possibility of being fooled by randomness - real trend or random artifact • Importance of domain knowledge and disciplined research • Bias / Variance tradeoff • Fit vs Predictive Power 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 45
  • 46.
    Takeaways • Data Scienceis a broad, fast-moving field, with hype and confusion • The “promise” of AI cannot be met by current or near future tech • We are surrounded by current use cases, many more emerging • Its recent growth is fueled by data, compute, algorithms, sw, and $$$ • Leverages existing data to improve operational efficiencies • Identifies unexpected connections but does not explore causation • It is not fool-proof and requires expert oversight • Cannot be fully explained (even introduced) in one short talk… 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 46
  • 47.
    Resources 11/16/20 Data ScienceSociety of Auburn // AI, ML, and DS Concepts // November 16, 2020 47
  • 48.
    Additional Resources forDeep Learning • http://neuralnetworksanddeeplearning.com/index.html – free, online only, starts with writing a simple backprop NN from scratch in Python • https://www.manning.com/books/deep-learning-with-python - build models in Keras and Tensorflow, written by creator of Keras, 2nd edition coming soon! • https://course.fast.ai – alternative to Keras built on PyTorch, all work is done inside Jupyter Notebooks 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 48
  • 49.
    Thank You. Contact Information: DanO’Leary dan.oleary@auburn.edu Blog / Portfolio / Links: bit.ly/aboutdjo 11/16/20 Data Science Society of Auburn // AI, ML, and DS Concepts // November 16, 2020 49 DATA SCIENCE SOCIETY OF AUBURN