ICML2017 Overview
& Some Topics
September 18th, 2017
Tatsuya Shirakawa
ABEJA, Inc. (Researcher)
- Deep Learning
- Computer Vision
- Natural Language Processing
- Graph Convolution / Graph Embedding
- Mathematical Optimization
- https://github.com/TatsuyaShiraka
tech blog → http://tech-blog.abeja.asia/
Poincaré Embeddings Graph Convolution
We are hiring! → https://www.abeja.asia/recruit/
→ https://six.abejainc.com/
1. ICML Intro & Stats
2. Trends and Topics
Table of Contents
3
1. ICML Intro & Stats
2. Trends and Topics
Table of Contents
4
International Conference on Machine Learning
• Top ML Conference
• 434 orals in 3 days
• 9 parallel tracks
• Submitted 1629 papers
• 4 talks from invited speakers
• 9 tutorial talks
• 9(parallel)x3(sessions)x3(days)=81 sessions in main conference
ICML 2017 at Sydney
5
Demos
15
Schedule
16
8/6
Tutorial Session

9 tutorials (3 parallel)
8/7
Main Conference Day 1
27 sessions (9 parallel)
8/8
Main Conference Day 2
27 sessions (9 parallel)
8/9
Main Conference Day 3
27 sessions (9 parallel)
8/10
Workshop Conference Day 1

11 sessions (11 parallel)
8/11
Workshop Conference Day 2

11 sessions (11 parallel)
1/3
max attend
1/9
1/9
1/9
1/11
1/11
1. ICML Intro & Stats
2. Trends and Topics
Table of Contents
17
• Deep learning is still the biggest trend
• Autonomous vehicles
• Health care / computational biology
• Human interpretability and visualization
• Multitask learning for small data or hard tasks
• Reinforcement learning
• Imitation learning (inverse reinforcement learning)
• Language and speech processing
• GANs / CNNs / RNNs / LSTMs are default options
• RNNs and their variant
• Optimizations
• Online learning / bandit
• Time series modeling
• Applications Session
Some Trends (highly biased)
18
• Gluon is a new deep learning wrapper framework, which integrates
dynamic dl frameworks (chainer, pytorch) and static dl frameworks
(keras, mxnet) and get the best of the both worlds (hybridize)
• Great resources including many latest models

https://github.com/apache/incubator-mxnet/tree/master/example
• Looks easy to write
• Alex Smola was the presenter
• … not so fast yet ? ←

[Tutorial] Distributed Deep Learning with MxNet Gluon
19
http://www-bcf.usc.edu/~liu32/icml_tutorial.pdf
• RNN works well
• + pretraining (combine other clinics’ data)
• + expert defined features
• + new models for missing data
• CNN works well on image data and achieved super-human accuracy
• Some Features of Health Care Data
• Small sample size
• Missing values
• Medical domain knowledge
• Interpretation
• Use gradient boosting trees to mimic deep learning models (cool idea!)
• Hard to annotate even for experts
• Big Small Data
• Limited amount of data available to train age-specific or disease-specific models
[Tutorial] Deep Learning Models for Health Care:
Challenges and Solutions
20
Future Directions:
- Modeling heterogeneous data sources
- Model interpretation
- More complex output
“Interpretable Deep Models for ICU Outcome Prediction”, 2016
• Deep Neural Networks are “black boxes”.
• Sensitive analyses methods can be applied
• ex: Grad-CAM
[Tutorial] Interpretable Machine Learning
21
• Generating periodic patterns with GANs
• Local/Global/Periodic vectors
“Learning Texture Manifolds with the Periodic Spatial
GAN”
22
Example for many texture and many periodicity.
Local vectors
Global vectors
Periodic vectors
• Sequence revising with generative/Inference models
• Generative model P(x, y, z)=P(x, y|z)P(z)
• x : input seq., y: goodness of x, z: hidden var.
• Inference model P(z|x) , P(y|z)
• Input x0 

-> infer z0 

-> search better z (better F(z)) 

-> reconstruct x
“Sequence to better sequence: Continuous Revision of
Combinatorial Structures”
23
• Generating a new step chart 

from a raw audio track
“Dance Dance Convolution”
24
• Gave a new algorithm and theoretical analysis for
sum of norms (SON) clustering
• SON (2011)
• Assigning center to each data point and applied
some regularization which magnetize centers
• Convex problem!
“Clustering by Sum of Norms: Stochastic Incremental
Algorithm, Convergence and Cluster Recovery”
25
Image Compression using Deep Learning
• VAE(almost reconstruction) + GAN(refinement)
• Faster than jpeg on gpu, but several secs on cpu
“Real-Time Adaptive Image Compression”
26
• Subgoals
• Breaking up the problem Into Subgoals
• Learn sub-policies to achieve them
• StreetLearn
• Transfer Learning
• Progressive Neural Networks
• Distral: Robust Multitask Reinforcement Learning
[Invited Talk] “Towards Reinforcement Learning in the
Complex World” - Raia Hadsell (Google Deep Mind)
27
• GANs are approximated by discrete distribution on some
finite samples (with high probability)
• Sample size =
• P = discriminator size, ε = error
• “The birthday paradox” test
• Sample m images from generator
• See if there are duplicate images
• Estimate the sample size
“Generalization and Equilibrium in Generative Adversarial Nets” 

& “Do GANs actually learn the distribution? Some theory and empirics”
28
˜O(p log(p/✏)/✏2
)
• Deterministic Rounding vs. Stochastic Rounding
• Theoretical explanation that SGD with stochastic
rounding does not converge well
• Every updates are too noisy
• Won the Google Best Student Paper Award
“Towards a Deeper Understanding of Training
Quantized Networks”
29
• RL produces much better sequence than log-likelihood based methods
• Why RL is so effective? (Beam Search Issues?)
“Sequence-Level Training of Neural Models for Visual
Dialog”
30
• Google’s Expander which enhances broad range of tasks using graph structure
• smart reply, personal assistant
• image recognition
• Integrated framework for
• zero-shot/one-shot learning
• multi-modal learning
• semi-supervised learning
• multi-task learning
• “Neural Graph Machines”
• introduces graph regularization into DL
• Adjacent nodes (data) are constrained to have near vector representations
Neural Graph Learning
31
2019 ICML + CVPR !
2021 Asia/Pac!
Future ICMLs
32
Any Questions?

Icml2017 overview

  • 1.
    ICML2017 Overview & SomeTopics September 18th, 2017 Tatsuya Shirakawa
  • 2.
    ABEJA, Inc. (Researcher) -Deep Learning - Computer Vision - Natural Language Processing - Graph Convolution / Graph Embedding - Mathematical Optimization - https://github.com/TatsuyaShiraka tech blog → http://tech-blog.abeja.asia/ Poincaré Embeddings Graph Convolution We are hiring! → https://www.abeja.asia/recruit/ → https://six.abejainc.com/
  • 3.
    1. ICML Intro& Stats 2. Trends and Topics Table of Contents 3
  • 4.
    1. ICML Intro& Stats 2. Trends and Topics Table of Contents 4
  • 5.
    International Conference onMachine Learning • Top ML Conference • 434 orals in 3 days • 9 parallel tracks • Submitted 1629 papers • 4 talks from invited speakers • 9 tutorial talks • 9(parallel)x3(sessions)x3(days)=81 sessions in main conference ICML 2017 at Sydney 5
  • 15.
  • 16.
    Schedule 16 8/6 Tutorial Session
 9 tutorials(3 parallel) 8/7 Main Conference Day 1 27 sessions (9 parallel) 8/8 Main Conference Day 2 27 sessions (9 parallel) 8/9 Main Conference Day 3 27 sessions (9 parallel) 8/10 Workshop Conference Day 1
 11 sessions (11 parallel) 8/11 Workshop Conference Day 2
 11 sessions (11 parallel) 1/3 max attend 1/9 1/9 1/9 1/11 1/11
  • 17.
    1. ICML Intro& Stats 2. Trends and Topics Table of Contents 17
  • 18.
    • Deep learningis still the biggest trend • Autonomous vehicles • Health care / computational biology • Human interpretability and visualization • Multitask learning for small data or hard tasks • Reinforcement learning • Imitation learning (inverse reinforcement learning) • Language and speech processing • GANs / CNNs / RNNs / LSTMs are default options • RNNs and their variant • Optimizations • Online learning / bandit • Time series modeling • Applications Session Some Trends (highly biased) 18
  • 19.
    • Gluon isa new deep learning wrapper framework, which integrates dynamic dl frameworks (chainer, pytorch) and static dl frameworks (keras, mxnet) and get the best of the both worlds (hybridize) • Great resources including many latest models
 https://github.com/apache/incubator-mxnet/tree/master/example • Looks easy to write • Alex Smola was the presenter • … not so fast yet ? ←
 [Tutorial] Distributed Deep Learning with MxNet Gluon 19
  • 20.
    http://www-bcf.usc.edu/~liu32/icml_tutorial.pdf • RNN workswell • + pretraining (combine other clinics’ data) • + expert defined features • + new models for missing data • CNN works well on image data and achieved super-human accuracy • Some Features of Health Care Data • Small sample size • Missing values • Medical domain knowledge • Interpretation • Use gradient boosting trees to mimic deep learning models (cool idea!) • Hard to annotate even for experts • Big Small Data • Limited amount of data available to train age-specific or disease-specific models [Tutorial] Deep Learning Models for Health Care: Challenges and Solutions 20 Future Directions: - Modeling heterogeneous data sources - Model interpretation - More complex output “Interpretable Deep Models for ICU Outcome Prediction”, 2016
  • 21.
    • Deep NeuralNetworks are “black boxes”. • Sensitive analyses methods can be applied • ex: Grad-CAM [Tutorial] Interpretable Machine Learning 21
  • 22.
    • Generating periodicpatterns with GANs • Local/Global/Periodic vectors “Learning Texture Manifolds with the Periodic Spatial GAN” 22 Example for many texture and many periodicity. Local vectors Global vectors Periodic vectors
  • 23.
    • Sequence revisingwith generative/Inference models • Generative model P(x, y, z)=P(x, y|z)P(z) • x : input seq., y: goodness of x, z: hidden var. • Inference model P(z|x) , P(y|z) • Input x0 
 -> infer z0 
 -> search better z (better F(z)) 
 -> reconstruct x “Sequence to better sequence: Continuous Revision of Combinatorial Structures” 23
  • 24.
    • Generating anew step chart 
 from a raw audio track “Dance Dance Convolution” 24
  • 25.
    • Gave anew algorithm and theoretical analysis for sum of norms (SON) clustering • SON (2011) • Assigning center to each data point and applied some regularization which magnetize centers • Convex problem! “Clustering by Sum of Norms: Stochastic Incremental Algorithm, Convergence and Cluster Recovery” 25
  • 26.
    Image Compression usingDeep Learning • VAE(almost reconstruction) + GAN(refinement) • Faster than jpeg on gpu, but several secs on cpu “Real-Time Adaptive Image Compression” 26
  • 27.
    • Subgoals • Breakingup the problem Into Subgoals • Learn sub-policies to achieve them • StreetLearn • Transfer Learning • Progressive Neural Networks • Distral: Robust Multitask Reinforcement Learning [Invited Talk] “Towards Reinforcement Learning in the Complex World” - Raia Hadsell (Google Deep Mind) 27
  • 28.
    • GANs areapproximated by discrete distribution on some finite samples (with high probability) • Sample size = • P = discriminator size, ε = error • “The birthday paradox” test • Sample m images from generator • See if there are duplicate images • Estimate the sample size “Generalization and Equilibrium in Generative Adversarial Nets” 
 & “Do GANs actually learn the distribution? Some theory and empirics” 28 ˜O(p log(p/✏)/✏2 )
  • 29.
    • Deterministic Roundingvs. Stochastic Rounding • Theoretical explanation that SGD with stochastic rounding does not converge well • Every updates are too noisy • Won the Google Best Student Paper Award “Towards a Deeper Understanding of Training Quantized Networks” 29
  • 30.
    • RL producesmuch better sequence than log-likelihood based methods • Why RL is so effective? (Beam Search Issues?) “Sequence-Level Training of Neural Models for Visual Dialog” 30
  • 31.
    • Google’s Expanderwhich enhances broad range of tasks using graph structure • smart reply, personal assistant • image recognition • Integrated framework for • zero-shot/one-shot learning • multi-modal learning • semi-supervised learning • multi-task learning • “Neural Graph Machines” • introduces graph regularization into DL • Adjacent nodes (data) are constrained to have near vector representations Neural Graph Learning 31
  • 32.
    2019 ICML +CVPR ! 2021 Asia/Pac! Future ICMLs 32
  • 33.