Azure Machine Learning and ML on Premises

Agenda
• Introduction to Machine Learning
• Azure ML & ML Studio
• Choosing the right algorithm
• Creating better ML models
• Evaluating model performance
• ML on-premises
• Demo

Software Architect @
15 years professional experience
.NET Web Development MCPD
External Expert Horizon 2020
External Expert Eurostars-Eureka & IFD
Business Interests
Web Development, SOA, Integration
Security & Performance Optimization
IoT, Computer Intelligence
Contact
ivelin.andreev@icb.bg
www.linkedin.com/in/ivelin
www.slideshare.net/ivoandreev

Majority of practical ML uses supervised learning
Mapping function approximated from experience
o Regression f(X) = Y, Y is a real number
o Classification f(X) = Y, Y is a category label
• Training
o Labeled positive and negative examples
o From unseen input, predict corresponding output
o Learning until acceptable performance is achieved

Discover hidden relations and learn about the data
o Clustering f(X) = [X1,…, Xk], k disjoint subsets
o Association f(Xi, Xj) = R, relation
Training
o All examples are positive
o No labeling / No teacher
o No single correct answer
Practical usage
o Derive groups, not explicitly labeled
o Market basket analysis (association among items)

Azure Machine Learning
• Primary goal: makes deployable and
scalable web services from the ML modules.
• Though experience for creating ML models is
great, it is not intended to be a place to
create and export models

• Part of Cortana Intelligence Suite

1. Dataset
2. Training Experiment
3. Predictive Experiment
4. Publish Web Service
5. Retrain Model

Azure ML BigML Amazon ML Google
Prediction
IBM Watson ML
Flexibility High High Low Low Low
Usability High Med High Low High
Training time Low Low High Med High
Accuracy (AUC) High High High Med High
Cloud/
On-premises
+/- +/+ +/- +/- +/-
Algorithms Classification
Regression
Clustering
Anomaly detect
Recommendations
Classification
Regression
Clustering
Anomaly
Recommend
Classification
Regression
Classification
Regression
Semantic mining
Hypothesis rank
Regression
Customizations Model parameters
R-script, Python
Evaluation support
Own models
C#, R, Node.js
Few parameters

Azure ML BigML Amazon ML Google
Prediction
IBM Watson
ML
Model Building $1.00 / h
$10.00 / user /
month
$30 - $10’000
/ month
$0.42 / h $0.002 / MB $0.45 / h
$10.00 / service
Retraining
(per 1000)
$0.50 - N/A $0.05 -
Prediction
(per 1000)
$0.50 - $0.10 $0.50 $0.50
Compute
(per hour)
$2.00 - $0.42 - -
Free Usage 1’000 / month
2h compute
Dataset Size
Max 16MB
N/A 10’000/month 5’000 / month
5h compute
Notes
Private
deployment
$55’000 / year
Shutdown
April 30, 2018
Cloud ML
Engine
(TensorFlow)

“I spent last semester building a
regression model in Python, and I
just did the same thing in 10
minutes with Azure ML”

• Answer: It is not required, but would definitely help
• Data Science is what is really necessary
Interdisciplinary field about processes and methods for extracting relations and knowledge from data
• Working around Math
• Select the right algorithm
• Use cheat-sheets instead
• Choose parameter settings
• Experiment
• Use parameter-sweep
• Identify underfitting and overfitting
• Compare training and validation scores

I selected an appropriate
ML algorithm but…
why my ML model fails?

Is it an ML task?
No, use another
solution
Is it correct ML
scenario?
No, try another
scenario
Is suitable model
identified?
Do you have
enough data?
Is the model overly
complicated?
Correct features
used?
Performed feature
engineering?
Proper evaluation
metrics?
Good evaluation
set?

1. Is it an ML task?
Hard: X is independent of Y: X <Name, Age, Income>, Height=?
Easy: X is a set with limited variations. Configure Y=F(X)
2. Appropriate ML scenario?
Supervised learning (classification, regression, anomaly detection)
Unsupervised learning (clustering, pattern learning)
3. Appropriate model?
Data size (small data -> linear model, large data -> consider non-linear)
Sparse data (require normalization to perform better)
Imbalanced data (special treatment of the minority class required)
Data quality (noise and missing values require loss function – i.e. L2)
4. Enough training data?
Investigate how precision improves with more data

5. Feature quality
Have you identified all useful features? Use domain knowledge of an expert.
Include any feature that could be found and investigate model performance
6. Feature engineering
The best strategy to improve performance and reveal important input
Encode features, normalize [0:1], combine features, resolve dependencies
7. Combine models
If multiple models have similar performance there is a chance of improvement
Use one model for one subset of data and another model for the other
8. Model Validation and Tuning
Use appropriate performance indicator (Accuracy, Precision, Recall, F1, etc.)
How well does the model describe data? (AUC)
Data typically divided into Training and Validation
Tune model hyper parameters (i.e. number of iterations)

Appropriate Algorithms are Determined
1st by Goal, 2nd by the Data

• Linear Algorithms
• Classification - classes separated by straight line
• Support Vector Machine – wide gap instead of line
• Regression – linear relation between variables and label
• Non-Linear Algorithms
• Decision Trees and Jungles - divide space into regions
• Neural Networks – complex and irregular boundaries
• Special Algorithms
• Ordinal Regression – ranked values (i.e. race)
• Poisson Regression - discrete distribution (i.e. nr. of events)
• Bayesian – normal distribution of errors (bell curve)

• Binary classification outcomes {negative; positive}
• ROC Curve
o TP Rate = True Positives / All Positives
o FP Rate = False Positives / All Negatives
• Example
• ROC AUC (Area Under Curve)
o KPI for model performance and model comparison
o 0.5 = Random prediction, 1 = Perfect match
• For multiclass – average from all RoC curves
TP Rate FP Rate 1-FP Rate
5 0.56 0.99 0.01
7 0.78 0.81 0.19
9 0.91 0.42 0.58

• Probability Threshold
o Cost of one error could be much higher that cost of other
o (i.e. Spam filter – it is more expensive to miss a real mail)
• Accuracy
o For symmetric 50/50 data
• Precision
o (i.e. 1000 devices, 6 fails, 8 predicted, 5 true failures)
o Correct positives (i.e. 5/8 = 0.625, FP are expensive)
• Recall
o Correctly predicted positives (i.e. 5/6=0.83, FN are expensive)
• F1 (balanced error cost)
o Balanced cost of Precision/Recall

Coefficient of Determination (R2)
o Single numeric KPI – how well data fits model
o R2>0.6 – good, R2>0.8 – very good R2=1 – perfect
Mean Absolute Error / Root Mean Squared Error
o Deviation of the estimates from the observed values
o Compare model errors measure in the SAME units
Relative Absolute Error / Relative Squared Error
o % deviation from real value
o Compare model errors measure in the DIFFERENT units

Creating Your First Machine
Learning Model

• Building blocks for creating experiments
• Split Data
• Split data into training and verification sets
• Train Model
• Trains model from untrained model and dataset
• Number of arguments = algorithm flexibility
• Score Model
• Confirm model results against data set
• Evaluate Model
• Get key quality factors of the results or evaluate against another model
• Tune Model
• Sweep model parameters for optimal settings

Nodes, organized in layers, with weighted connections
• Layers
o Input (1), Output (1)
o Shallow - 1 hidden layer
o Deep – multiple hidden layers
• Azure ML Net# language
• Define DNN layers
• Bundles (connections)
• Activation functions

• Definition in C++, Python, C# (beta), BrainScript

Microsoft Virtual Academy Free e-Book
http://bit.ly/a4r-mlbook

Azure Machine Learning and ML on Premises

More Related Content

What's hot

Similar to Azure Machine Learning and ML on Premises

More from Ivo Andreev

Recently uploaded

Azure Machine Learning and ML on Premises