Agenda
• Introduction to Machine Learning
• Azure ML & ML Studio
• Choosing the right algorithm
• Creating better ML models
• Evaluating model performance
• ML on-premises
• Demo
Software Architect @
15 years professional experience
.NET Web Development MCPD
External Expert Horizon 2020
External Expert Eurostars-Eureka & IFD
Business Interests
Web Development, SOA, Integration
Security & Performance Optimization
IoT, Computer Intelligence
Contact
ivelin.andreev@icb.bg
www.linkedin.com/in/ivelin
www.slideshare.net/ivoandreev
Majority of practical ML uses supervised learning
Mapping function approximated from experience
o Regression f(X) = Y, Y is a real number
o Classification f(X) = Y, Y is a category label
• Training
o Labeled positive and negative examples
o From unseen input, predict corresponding output
o Learning until acceptable performance is achieved
Discover hidden relations and learn about the data
o Clustering f(X) = [X1,…, Xk], k disjoint subsets
o Association f(Xi, Xj) = R, relation
Training
o All examples are positive
o No labeling / No teacher
o No single correct answer
Practical usage
o Derive groups, not explicitly labeled
o Market basket analysis (association among items)
Azure Machine Learning
• Primary goal: makes deployable and
scalable web services from the ML modules.
• Though experience for creating ML models is
great, it is not intended to be a place to
create and export models
• Part of Cortana Intelligence Suite
1. Dataset
2. Training Experiment
3. Predictive Experiment
4. Publish Web Service
5. Retrain Model
Azure ML BigML Amazon ML Google
Prediction
IBM Watson ML
Flexibility High High Low Low Low
Usability High Med High Low High
Training time Low Low High Med High
Accuracy (AUC) High High High Med High
Cloud/
On-premises
+/- +/+ +/- +/- +/-
Algorithms Classification
Regression
Clustering
Anomaly detect
Recommendations
Classification
Regression
Clustering
Anomaly
Recommend
Classification
Regression
Classification
Regression
Semantic mining
Hypothesis rank
Regression
Customizations Model parameters
R-script, Python
Evaluation support
Own models
C#, R, Node.js
Few parameters
Azure ML BigML Amazon ML Google
Prediction
IBM Watson
ML
Model Building $1.00 / h
$10.00 / user /
month
$30 - $10’000
/ month
$0.42 / h $0.002 / MB $0.45 / h
$10.00 / service
Retraining
(per 1000)
$0.50 - N/A $0.05 -
Prediction
(per 1000)
$0.50 - $0.10 $0.50 $0.50
Compute
(per hour)
$2.00 - $0.42 - -
Free Usage 1’000 / month
2h compute
Dataset Size
Max 16MB
N/A 10’000/month 5’000 / month
5h compute
Notes
Private
deployment
$55’000 / year
Shutdown
April 30, 2018
Cloud ML
Engine
(TensorFlow)
“I spent last semester building a
regression model in Python, and I
just did the same thing in 10
minutes with Azure ML”
• Answer: It is not required, but would definitely help
• Data Science is what is really necessary
Interdisciplinary field about processes and methods for extracting relations and knowledge from data
• Working around Math
• Select the right algorithm
• Use cheat-sheets instead
• Choose parameter settings
• Experiment
• Use parameter-sweep
• Identify underfitting and overfitting
• Compare training and validation scores
I selected an appropriate
ML algorithm but…
why my ML model fails?
Is it an ML task?
No, use another
solution
Is it correct ML
scenario?
No, try another
scenario
Is suitable model
identified?
Do you have
enough data?
Is the model overly
complicated?
Correct features
used?
Performed feature
engineering?
Proper evaluation
metrics?
Good evaluation
set?
1. Is it an ML task?
Hard: X is independent of Y: X <Name, Age, Income>, Height=?
Easy: X is a set with limited variations. Configure Y=F(X)
2. Appropriate ML scenario?
Supervised learning (classification, regression, anomaly detection)
Unsupervised learning (clustering, pattern learning)
3. Appropriate model?
Data size (small data -> linear model, large data -> consider non-linear)
Sparse data (require normalization to perform better)
Imbalanced data (special treatment of the minority class required)
Data quality (noise and missing values require loss function – i.e. L2)
4. Enough training data?
Investigate how precision improves with more data
5. Feature quality
Have you identified all useful features? Use domain knowledge of an expert.
Include any feature that could be found and investigate model performance
6. Feature engineering
The best strategy to improve performance and reveal important input
Encode features, normalize [0:1], combine features, resolve dependencies
7. Combine models
If multiple models have similar performance there is a chance of improvement
Use one model for one subset of data and another model for the other
8. Model Validation and Tuning
Use appropriate performance indicator (Accuracy, Precision, Recall, F1, etc.)
How well does the model describe data? (AUC)
Data typically divided into Training and Validation
Tune model hyper parameters (i.e. number of iterations)
Appropriate Algorithms are Determined
1st by Goal, 2nd by the Data
• Linear Algorithms
• Classification - classes separated by straight line
• Support Vector Machine – wide gap instead of line
• Regression – linear relation between variables and label
• Non-Linear Algorithms
• Decision Trees and Jungles - divide space into regions
• Neural Networks – complex and irregular boundaries
• Special Algorithms
• Ordinal Regression – ranked values (i.e. race)
• Poisson Regression - discrete distribution (i.e. nr. of events)
• Bayesian – normal distribution of errors (bell curve)
• Binary classification outcomes {negative; positive}
• ROC Curve
o TP Rate = True Positives / All Positives
o FP Rate = False Positives / All Negatives
• Example
• ROC AUC (Area Under Curve)
o KPI for model performance and model comparison
o 0.5 = Random prediction, 1 = Perfect match
• For multiclass – average from all RoC curves
TP Rate FP Rate 1-FP Rate
5 0.56 0.99 0.01
7 0.78 0.81 0.19
9 0.91 0.42 0.58
• Probability Threshold
o Cost of one error could be much higher that cost of other
o (i.e. Spam filter – it is more expensive to miss a real mail)
• Accuracy
o For symmetric 50/50 data
• Precision
o (i.e. 1000 devices, 6 fails, 8 predicted, 5 true failures)
o Correct positives (i.e. 5/8 = 0.625, FP are expensive)
• Recall
o Correctly predicted positives (i.e. 5/6=0.83, FN are expensive)
• F1 (balanced error cost)
o Balanced cost of Precision/Recall
Coefficient of Determination (R2)
o Single numeric KPI – how well data fits model
o R2>0.6 – good, R2>0.8 – very good R2=1 – perfect
Mean Absolute Error / Root Mean Squared Error
o Deviation of the estimates from the observed values
o Compare model errors measure in the SAME units
Relative Absolute Error / Relative Squared Error
o % deviation from real value
o Compare model errors measure in the DIFFERENT units
Creating Your First Machine
Learning Model
• Building blocks for creating experiments
• Split Data
• Split data into training and verification sets
• Train Model
• Trains model from untrained model and dataset
• Number of arguments = algorithm flexibility
• Score Model
• Confirm model results against data set
• Evaluate Model
• Get key quality factors of the results or evaluate against another model
• Tune Model
• Sweep model parameters for optimal settings
Nodes, organized in layers, with weighted connections
• Layers
o Input (1), Output (1)
o Shallow - 1 hidden layer
o Deep – multiple hidden layers
• Azure ML Net# language
• Define DNN layers
• Bundles (connections)
• Activation functions
• Definition in C++, Python, C# (beta), BrainScript
Microsoft Virtual Academy Free e-Book
http://bit.ly/a4r-mlbook
https://studio.azureml.net/

Azure Machine Learning and ML on Premises

  • 2.
    Agenda • Introduction toMachine Learning • Azure ML & ML Studio • Choosing the right algorithm • Creating better ML models • Evaluating model performance • ML on-premises • Demo
  • 3.
    Software Architect @ 15years professional experience .NET Web Development MCPD External Expert Horizon 2020 External Expert Eurostars-Eureka & IFD Business Interests Web Development, SOA, Integration Security & Performance Optimization IoT, Computer Intelligence Contact ivelin.andreev@icb.bg www.linkedin.com/in/ivelin www.slideshare.net/ivoandreev
  • 5.
    Majority of practicalML uses supervised learning Mapping function approximated from experience o Regression f(X) = Y, Y is a real number o Classification f(X) = Y, Y is a category label • Training o Labeled positive and negative examples o From unseen input, predict corresponding output o Learning until acceptable performance is achieved
  • 6.
    Discover hidden relationsand learn about the data o Clustering f(X) = [X1,…, Xk], k disjoint subsets o Association f(Xi, Xj) = R, relation Training o All examples are positive o No labeling / No teacher o No single correct answer Practical usage o Derive groups, not explicitly labeled o Market basket analysis (association among items)
  • 7.
    Azure Machine Learning •Primary goal: makes deployable and scalable web services from the ML modules. • Though experience for creating ML models is great, it is not intended to be a place to create and export models
  • 8.
    • Part ofCortana Intelligence Suite
  • 9.
    1. Dataset 2. TrainingExperiment 3. Predictive Experiment 4. Publish Web Service 5. Retrain Model
  • 10.
    Azure ML BigMLAmazon ML Google Prediction IBM Watson ML Flexibility High High Low Low Low Usability High Med High Low High Training time Low Low High Med High Accuracy (AUC) High High High Med High Cloud/ On-premises +/- +/+ +/- +/- +/- Algorithms Classification Regression Clustering Anomaly detect Recommendations Classification Regression Clustering Anomaly Recommend Classification Regression Classification Regression Semantic mining Hypothesis rank Regression Customizations Model parameters R-script, Python Evaluation support Own models C#, R, Node.js Few parameters
  • 11.
    Azure ML BigMLAmazon ML Google Prediction IBM Watson ML Model Building $1.00 / h $10.00 / user / month $30 - $10’000 / month $0.42 / h $0.002 / MB $0.45 / h $10.00 / service Retraining (per 1000) $0.50 - N/A $0.05 - Prediction (per 1000) $0.50 - $0.10 $0.50 $0.50 Compute (per hour) $2.00 - $0.42 - - Free Usage 1’000 / month 2h compute Dataset Size Max 16MB N/A 10’000/month 5’000 / month 5h compute Notes Private deployment $55’000 / year Shutdown April 30, 2018 Cloud ML Engine (TensorFlow)
  • 13.
    “I spent lastsemester building a regression model in Python, and I just did the same thing in 10 minutes with Azure ML”
  • 14.
    • Answer: Itis not required, but would definitely help • Data Science is what is really necessary Interdisciplinary field about processes and methods for extracting relations and knowledge from data • Working around Math • Select the right algorithm • Use cheat-sheets instead • Choose parameter settings • Experiment • Use parameter-sweep • Identify underfitting and overfitting • Compare training and validation scores
  • 15.
    I selected anappropriate ML algorithm but… why my ML model fails?
  • 16.
    Is it anML task? No, use another solution Is it correct ML scenario? No, try another scenario Is suitable model identified? Do you have enough data? Is the model overly complicated? Correct features used? Performed feature engineering? Proper evaluation metrics? Good evaluation set?
  • 17.
    1. Is itan ML task? Hard: X is independent of Y: X <Name, Age, Income>, Height=? Easy: X is a set with limited variations. Configure Y=F(X) 2. Appropriate ML scenario? Supervised learning (classification, regression, anomaly detection) Unsupervised learning (clustering, pattern learning) 3. Appropriate model? Data size (small data -> linear model, large data -> consider non-linear) Sparse data (require normalization to perform better) Imbalanced data (special treatment of the minority class required) Data quality (noise and missing values require loss function – i.e. L2) 4. Enough training data? Investigate how precision improves with more data
  • 18.
    5. Feature quality Haveyou identified all useful features? Use domain knowledge of an expert. Include any feature that could be found and investigate model performance 6. Feature engineering The best strategy to improve performance and reveal important input Encode features, normalize [0:1], combine features, resolve dependencies 7. Combine models If multiple models have similar performance there is a chance of improvement Use one model for one subset of data and another model for the other 8. Model Validation and Tuning Use appropriate performance indicator (Accuracy, Precision, Recall, F1, etc.) How well does the model describe data? (AUC) Data typically divided into Training and Validation Tune model hyper parameters (i.e. number of iterations)
  • 19.
    Appropriate Algorithms areDetermined 1st by Goal, 2nd by the Data
  • 20.
    • Linear Algorithms •Classification - classes separated by straight line • Support Vector Machine – wide gap instead of line • Regression – linear relation between variables and label • Non-Linear Algorithms • Decision Trees and Jungles - divide space into regions • Neural Networks – complex and irregular boundaries • Special Algorithms • Ordinal Regression – ranked values (i.e. race) • Poisson Regression - discrete distribution (i.e. nr. of events) • Bayesian – normal distribution of errors (bell curve)
  • 21.
    • Binary classificationoutcomes {negative; positive} • ROC Curve o TP Rate = True Positives / All Positives o FP Rate = False Positives / All Negatives • Example • ROC AUC (Area Under Curve) o KPI for model performance and model comparison o 0.5 = Random prediction, 1 = Perfect match • For multiclass – average from all RoC curves TP Rate FP Rate 1-FP Rate 5 0.56 0.99 0.01 7 0.78 0.81 0.19 9 0.91 0.42 0.58
  • 22.
    • Probability Threshold oCost of one error could be much higher that cost of other o (i.e. Spam filter – it is more expensive to miss a real mail) • Accuracy o For symmetric 50/50 data • Precision o (i.e. 1000 devices, 6 fails, 8 predicted, 5 true failures) o Correct positives (i.e. 5/8 = 0.625, FP are expensive) • Recall o Correctly predicted positives (i.e. 5/6=0.83, FN are expensive) • F1 (balanced error cost) o Balanced cost of Precision/Recall
  • 23.
    Coefficient of Determination(R2) o Single numeric KPI – how well data fits model o R2>0.6 – good, R2>0.8 – very good R2=1 – perfect Mean Absolute Error / Root Mean Squared Error o Deviation of the estimates from the observed values o Compare model errors measure in the SAME units Relative Absolute Error / Relative Squared Error o % deviation from real value o Compare model errors measure in the DIFFERENT units
  • 24.
    Creating Your FirstMachine Learning Model
  • 25.
    • Building blocksfor creating experiments • Split Data • Split data into training and verification sets • Train Model • Trains model from untrained model and dataset • Number of arguments = algorithm flexibility • Score Model • Confirm model results against data set • Evaluate Model • Get key quality factors of the results or evaluate against another model • Tune Model • Sweep model parameters for optimal settings
  • 26.
    Nodes, organized inlayers, with weighted connections • Layers o Input (1), Output (1) o Shallow - 1 hidden layer o Deep – multiple hidden layers • Azure ML Net# language • Define DNN layers • Bundles (connections) • Activation functions
  • 28.
    • Definition inC++, Python, C# (beta), BrainScript
  • 29.
    Microsoft Virtual AcademyFree e-Book http://bit.ly/a4r-mlbook
  • 30.