Machine Learning Today: Current Research And Advances From AMLAB, UvA

M A C H I N E L E A R N I N G T O D AY: C U R R E N T
R E S E A R C H A N D A D VA N C E S F R O M A M L A B , U V A
D A N I E L W O R R A L L

WHO ARE WE?
~30 researchers working under Max Welling and Joris Mooij
- 4 industrially funded ‘labs’
- Everyone works in deep learning
- We do fundamental research in machine learning

WHAT IS MACHINE LEARNING?
Mauna Loa is one of five volcanoes that form the Island of Hawaii in the U.S.
state of Hawaii in the Pacific Ocean. The largest subaerial volcano in both
mass and volume, Mauna Loa has historically been considered the largest
volcano on Earth, dwarfed only by Tamu Massif.

In machine learning we use past data to make predictions about the future.
p(y⇤|x⇤, D) = N y⇤|µ(x⇤), 2
(x⇤)
DataTest inputTest output Gaussian

Predictions are probability distributions.
Our main tool, conditional distributions:
Data
p(x|✓)
Parameters/models/unknowns
Symmetry constraints
Domain choice
Flexibility
Approximations
Computation
Memory
How do we choose p?
How do we learn θ?

Probability
p(x|✓)
✓
{x1, x2, ...} ✓
{x1, x2, ...}
Statistics
Machine
Learning
✓
{x1, x2, ...}
{x⇤}
Some terminology

WHAT WE DO
- Variational methods
- Normalizing ﬂows
- Graphs
- Symmetry
- Reinforcement learning
- Transfer learning
- Medical imaging
- Generative modelling
- Compression
- Low precision neural networks
- Spiking neural networks
- Semi-supervised learning

VARIATIONAL METHODS
Approximate inference
If we use latents (each x has a z) then we have a variational auto-encoder
arg max Ep(x)
⇥
Eq (z|x)[p(x|z)] DKL [q (z|x)kp(z)]
⇤
= arg min q (✓) log
q (✓)
p(✓|D)
d✓
= arg min
Z
q (✓) log
q (✓)p(D)
p(D|✓)p(✓)
d✓
= arg max Eq (✓)[p(D|✓)] DKL [q (✓)kp(✓)]
Neural network
Kingma and Welling (2013)

NORMALIZING FLOWS
What is a flexible probability distribution?
e.g. p(x) = N(x|µ, 2
)
e.g. p(x) =
X
i
⇡iN(x|µi, 2
i )
x = f✓(z), z ⇠ p(z)
Implicitly define a distribution via a change
of variables
=) p(x) = p(z) det
@z
@x
= p(z) det
@f
@z
1
Rather expensive
Goal: design flexible f with cheap determinants
Target Flow
Rezende & Mohamed (2016)

NORMALIZING FLOWS
x = f✓(z), z ⇠ p(z)
Kingma & Dhariwal (2018)

NORMALIZING FLOWS: INVERTIBLE LAYERS
y = g(Wx + b)
Typical layer
Householder flow: volume-preservation
zt =
✓
I 2
vtv>
t
kvtk2
◆
zt 1 = Htzt
Predicted by NN
Tomczak and Welling (2016)
Sylvester normalising flows
zt = zt 1 + Ah (Bzt 1 + b)
det(I + AB) = det(I + BA)
det
@zt+1
@zt
= det (I + diag(h0
(Bz + b))BA)
van den Berg et al. (2018)
Emerging convolutions
Hoogeboom et al. (2019)

GRAPHS
A lot of data is graph-based: social networks, particle interactions, human
skeletal data, molecular structures, 3D graphics meshes

GRAPHS
Structure Weights
Kipf and Welling (2017)

GRAPHS: MOTION PREDICTION
Kipf et al. (2018)

SYMMETRY?
f(I) = f(T✓[I])
Notational aside:
T✓[I](x) = I(x ✓)
T✓[I](x) = I(R 1
✓ x)
T [I] = (I µ)/. 1
function/
feature mapping
image
transformation
Symmetry is a property of functions/tasks, e.g.
Classiﬁcation
Disentangling
(cocktail party)
Signal discovery/detection
e.g. Geometric translation
e.g. Geometric rotation
e.g. Pixel normalisation
Set of input transformations leaving invariantS✓[f](I) = f(T✓[I])

EQUIVARIANCE
S✓[f](I) = f(T✓[I])S✓[f](I) = f(T✓[I])S✓[f](I) = f(T✓[I])S✓[f](I) = f(T✓[I])
transformation in feature space
Mapping preserves algebraic
structure of transformation
Different representations of same
transformation
https://github.com/vdumoulin/conv_arithmetic
Convolution (and correlation)
[I ⇤ W](x ✓) = T✓[I] ⇤ W(x)
S✓ = Id
Invariance
Convolutions Symmetry

GROUP EXAMPLES
*Current research direction: Scalings are probably better modelled as semigroups, i.e. groups without
the invertibility condition.
Scalings*
Translation
Reﬂections
Roto-translationRotation
Occlusions
Non-example

GROUP CONVOLUTIONS
“Convolution” examples [I ⇤ W](✓) =
X
x2Z2
I(x)W(R 1
✓ x)
[I ⇤ W](y) =
X
x2Z2
I(x)W(x y)
[I ⇤ W](✓, y) =
X
x2Z2
I(x)W(R 1
✓ x y)
[I ⇤ W](✓) =
X
x2Z2
I(x)T✓[W](x)Group convolution
[I ⇤ W](✓) =
X
x2Z2
T✓[I](x)W(x)Semigroup convolution

DenseNet Rotation
equivariant
DenseNet
DenseNet Rotation
equivariant
DenseNet
Input
Mean prediction Standard deviation
EXAMPLES

Machine Learning Today: Current Research And Advances From AMLAB, UvA

Machine Learning Today: Current Research And Advances From AMLAB, UvA

More Related Content

What's hot

Similar to Machine Learning Today: Current Research And Advances From AMLAB, UvA

More from Advanced-Concepts-Team

Recently uploaded

Machine Learning Today: Current Research And Advances From AMLAB, UvA