The Molecular
Autoencoder
Dan Elton
1/24/2018 Dan Elton, P.W. Chung Group Meeting 2
What is a Machine Learning?
"Machine Learning is a field of study that gives computers the ability to learn without
being explicitly programmed" - Arthur Samuel, 1959
"A computer program is said to learn from experience E with respect to some class of
tasks T and performance measure P, if its performance at tasks in T, as measured by
P, improves with the experience E." - Tom M. Mitchell.
Reinforcement
learning
Unsupervised learningSupervised learning
• Regression
• Classification
Model Y = f(x) to match data (x,y)
• Parametric models
• Linear models
• Polynomial model
• Logistic model
• Neural network model
• Convolutional Neural network
• Non parametric models
• Kernel Ridge regression
• Decision tree
• Gaussian Process regression
• Kernel SVM
• Clustering
• Dimensionality reduction
• Autoencoders
• Robotics , etc
1/24/2018 Dan Elton, P.W. Chung Group Meeting 3
Supervised learning workflow
Source : sci-kit-learn.org
1/24/2018 Dan Elton, P.W. Chung Group Meeting 4
What is a neural network?
Dendrites
(input wires)
Terminal axons
(output wires)
1/24/2018 Dan Elton, P.W. Chung Group Meeting 5
What is a neural network?
Input layer hidden layer output layer
are the weights
Activations of layer i Input or activations from layer i-1
is the activation function
1/24/2018 Dan Elton, P.W. Chung Group Meeting 6
Activation functions
Binary step
closest to biological
neurons, but
no gradient info =(
Logistic/Sigmoid
arctan()
Rectified Linear Unit
(ReLU)
Maintains a nice large gradient
Exponential Linear
Unit
(ELU)
1/24/2018 Dan Elton, P.W. Chung Group Meeting 7
What is convolution?
Input
Output
1 dimensional convolution with the filter aka “kernel”
Convolution with stride = 2
1/24/2018 Dan Elton, P.W. Chung Group Meeting 8
What is convolution?
Source: http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
“Feature map”
2 dimensional convolution with the 2x2 filter
Note that the edges were lost.
There are ways to prevent this,
such as padding the edges with
zeros.
1 0 1
0 1 0
1 0 1
1/24/2018 Dan Elton, P.W. Chung Group Meeting 9
What are convolutional neural nets?
By most accounts the CNN was invented by Yan Lecun . He
developed the “LeNet” in 1998 for at ATT’s Bell Laboratories
for reading digits.
Architecture of LeNet:
1/24/2018 Dan Elton, P.W. Chung Group Meeting 10
What are convolutional neural nets?
“2D” images are actually 3D, because they have 3 color channels.
A 3D diagram conveys best what a CNN actually does. The depth of
the non-input layers is the # of filters. Typically the # of filters in each
successive layer increases while the size of the filters decreases:
1/24/2018 Dan Elton, P.W. Chung Group Meeting 11
What are convolutional neural nets?
By many accounts the current deep learning boom began when Krizhevsky, Sutskever and
Hinton used a CNN to win the 2010 ImageNet image classification competition. The
resulting publication has 13,000+ citations.
A Krizhevsky, I Sutskever and GE Hinton Imagenet classification with deep convolutional neural networks Advances
in neural information processing systems, 1097-1105 (2012)
Architecture they used , it has 60 million parameters and 650,000 neurons
Why do CNNs work so well?
They learn a hierarchical set of features the same way the mammalian visual cortex does!
Dan Elton, P.W. Chung Group Meeting1/24/2018 12
Hubel & Wiesel, 1959
Receptive fields of single
neurons in the cat’s
striate cortex
Slide from
Yan LeCun
1/24/2018 Dan Elton, P.W. Chung Group Meeting 13
What is an autoencoder?
• The “latent space” is also called the “low dimensional manifold”, “compressed
representation”, or “thought vector”
• See “Decoding the Thought Vector” for amazing examples of how faces are
compressed: http://gabgoh.github.io/ThoughtVectors/
Source: keras blog
1/24/2018 Dan Elton, P.W. Chung Group Meeting 14
What is a variational autoencoder?
• During training, the output is sampled from the enforced
distribution as mean + random_noise * variance, during testing
the output is the mean.
• Minimize Kullback–Leibler divergence
D.P. Kingma, M. Welling
Auto-Encoding Variational Bayes
The International Conference on Learning Representations (ICLR), Banff, 2014
[arXiv preprint].
1/24/2018 Dan Elton, P.W. Chung Group Meeting 15
What are recursive neural networks?
Recursive Neural Networks (RNNs) have loops.
The simplest RNN is shown on the left, it contains one
feedback loop
The mathematics and calculation of gradients (ie backpropagation) can be made
isomorphic to that of a feed-forward neural network via time unrolling
Output we are
interested in
inputs
All of these beautiful figures are taken from http://colah.github.io/posts/2015-08-
Understanding-LSTMs/ Copyright by Christopher Olah.
1/24/2018 Dan Elton, P.W. Chung Group Meeting 16
What are recursive neural networks?
Ex.: video
classification:
Inputs all frames
video, output a
classification for
each frame
Ex.: translation:
input Spanish,
output English
Ex.: sentiment
analysis:
Input text,
output positive
or negative
sentiment
Ex.: image
captioning:
Input image,
output
sequence of
words.
RNNs can be run many different ways…..
“seq2seq”
1/24/2018 Dan Elton, P.W. Chung Group Meeting 17
What is a gated recurrent unit?
RNNs have trouble capturing long range decencies
Suppose we need the output at time t+1 to depend on x0, x1, which happened in the
distant past of the input stream.
Technically this is called the vanishing gradient problem – the dependence (gradient)
becomes exponentially small with the number of layers it has to pass through. There
is also an exploding gradient problem, where the gradient increases exponetially .
1/24/2018 Dan Elton, P.W. Chung Group Meeting 18
What is an LSTM?
Sepp Hochreiter & Jurgen Schmidhuber (right) invented the
Long Short Term Memory (LSTM) unit in 1997 to solve the
vanishing gradient problem. LSTMs were recently used by
Google for human-level accuracy machine translation. Apple
uses LSTMs in Siri, etc etc.
The LSTM looks complicated but it is actually based on an
extremely simple idea – add a memory cell:
Output state
1/24/2018 Dan Elton, P.W. Chung Group Meeting 19
How does an LSTM work?
“forget” gate “input” gate read out gate
tanh()sigmoid/logistic
1/24/2018 Dan Elton, P.W. Chung Group Meeting 20
LSTM vs. Gated Recurrent Unit (GRU)
The GRU unit1 makes major changes to the LSTM:
• Output and memory cells are merged
• “forget” and “input” gates are merged into a single “update” gate
• Performance is similar to LSTM3 or slightly better2,4 but with less free parameters:
(6 vs 12 for a 1D input/output)
1. Cho, Kyunghyun, van Merrienboer, Bart, Gulcehre, Caglar, Bougares, Fethi, Schwenk, Holger, and Bengio, Yoshua. Learning Phrase
Representations using RNN Encoder-Decoder for Statistical Machine Translation. (2014) arXiv:1406.1078
2. Jozefowicz et al. An Empirical Exploration of Recurrent Network Architectures, Proceedings of the 32nd International Conference on
Machine Learning, 2015
3. Klaus Greff, Rupesh Kumar Srivastava, Jan Koutník, Bas R. Steunebrink, Jürgen Schmidhuber, LSTM: A Search Space Odyssey (2015)
arXiv:1503.04069
4. Chung, Junyoung, Gulcehre, Caglar, Cho, KyungHyun, and Bengio, Yoshua. Empirical Evaluation of Gated Recurrent Neural Networks on
Sequence Modeling. (2014) arXiv:1412.3555
If the dimensionality of the input is n
and the dimensionality of the output
is d, then
#of parameters
LSTM 4*d*(n+d+1)
GRU 3*d*(n+d)
1/24/2018 Dan Elton, P.W. Chung Group Meeting 21
What are SMILES strings?
SMILES (simplified molecular-input line-entry system) encode 2D molecular graphs into 1D.
Example
CC(=O)NCCC1=CNc2c1cc(OC)cc2 CN1CCC[C@H]1C2=CN=CC=C2
The only ambiguity in SMILES strings:
• They do not capture 3D structure. However for small molecules and most application
areas this doesn’t matter much as molecules generally only have one conformation, so it
is implicitly contained. It only would matter in something like proteins, which might fold
into more than one conformation, or if the molecules are interacting with something like
an interface.
FC(F)FCCC(=O)O
1/24/2018 Dan Elton, P.W. Chung Group Meeting 22
One-hot encoding
There are 35 characters (C, N, O, @, -, =, etc)
The maximal molecule length is 120, molecules shorter than this are padded with 0s
3 1 Dimensional Convolution Layers.
Gated recurrent unit (GRU) layers with 501 element memory cells
”time distributed dense layer” (a separate dense layer applied to each timestep
“flattening” – reshapes a 2D array to a 1D array
Two dense (fully connected) neural network layers, with
435 and 292 neurons, respectively
latent layer: mean and standard deviation units
Custom layer to sample the Gaussian distributions during training
Overall auto encoder architecture
1/24/2018 Dan Elton, P.W. Chung Group Meeting 23
Architecture
Dense (fully connected) neural network layer, 292 neurons
one-hot inputs
9 convolution filters of length 9
9 convolution filters of length 9
11 convolution filters of length 10
1/24/2018 Dan Elton, P.W. Chung Group Meeting 24
How does one determine architecture?
The JSON file for the molecular autoencoder reveals about 200+
hyperparameters.
The most important are:
• Number of layers
• Types of layers
• Size and # of filters in CNN layers
• # of hidden cells in GRU layers (also called # of units)
• Number of latent variables
There are various ways of regularizing that can be turned on in several or all
layers:
• L1/ L2 weight regularization
• Weight sharing
• Dropout (currently most popular)
1/24/2018 Dan Elton, P.W. Chung Group Meeting 25
How does one determine architecture?
1. This Week in Machine Learning (TWiML) Podcast, interview with Matthew Zeiler and others.
2. J Snoek, H Larochelle, RP Adams, Practical bayesian optimization of machine learning algorithms Advances in neural
information processing systems, 2951-2959 (2012)
3. Google Research Blog: Using Machine Learning to Explore Neural Network Architecture
4 Sean C. Smithson, Guang Yang, Warren J. Gross, Brett H. Meyer
Neural Networks Designing Neural Networks: Multi-Objective Hyper-Parameter Optimization arXiv:1611.02120
• Historically, design for deep networks has been a black art. This is part of the
reason deep learning jobs have such high salaries.1 There are many heuristics but
no overarching theory guiding design yet.
• Bayesian Optimization is one approach 2
• People at Google use reinforcement learning and genetic algorithms to design
complex deep networks, like the GoogleNet shown above, which can create designs
that perform as well as from human designers. 3
• People have even used neural networks to design neural nets. 4
1/24/2018 Dan Elton, P.W. Chung Group Meeting 26
Latent space projection into 2D via t-SNE
250,000
commercially
available drug-like
molecules from the
ZINC database
150,000 Organic LED
molecules,
combinatorically generated1
1.) Rafael Gómez-Bombarelli et al. “Design of efficient molecular organic light-emitting diodes by a high-
throughput virtual screening and experimental approach”. In: Nat. Mater. 15 pp. 1120–1127 (2016)
Data sets that are available
1/24/2018 Dan Elton, P.W. Chung Group Meeting 27
Name Description # of molecules Size
GDB-17-Set (50 million)
http://gdb.unibe.ch/downloads/
50,000,000
GBD-13 C-N molecules
GBD-13 C-N-O
molecules
ZINC database
zinc.docking.org
Commercially available
molecules
22,724,825
1/24/2018 Dan Elton, P.W. Chung Group Meeting 28
Adversarially trained autoencoder
1. Goodfellow, Ian J.; Pouget-Abadie, Jean; Mirza, Mehdi; Xu, Bing; Warde-Farley, David; Ozair, Sherjil; Courville, Aaron; Bengio, Yoshua (2014)
"Generative Adversarial Networks". arXiv:1406.2661
2. A. Makhzani, J. Shlens, N. Jaitly, and I. Goodfellow, in International Conference on Learning Representations, (2016), arxiv.org:1511.05644
3. Kadurin, Artur et al. “The Cornucopia of Meaningful Leads: Applying Deep Adversarial Autoencoders for New Molecule Development in
Oncology.” Oncotarget 8.7 (2017): 10883–10890. PMC. Web. 2 Aug. 2017.
Generative adversarial networks1 (GANs) have exploded in popularity
since 2014. Adversarial autoencoders2 (AAE) apply the GAN framework to
variational autoencoder training.
The adversarial
autoencoder is an
autoencoder that is
regularized by
matching the
aggregated posterior ,
q(z) derived from the
data distribution, to an
arbitrary prior, p(z).
Here p(z) is a ”the
Normal distribution
N(5,1)”
Application to oncology molecular lead discovery (2017)3
1/24/2018 Dan Elton, P.W. Chung Group Meeting 29
“Molecular Tinder” for screening OLED molecules
From Aspuru-Guzik group: http://chimad.northwestern.edu/docs/DDD_WS_II/12_Aspuru_Guzik.p
1/24/2018 Dan Elton, P.W. Chung Group Meeting 30
Teacher forcing

Molecular autoencoder

  • 1.
  • 2.
    1/24/2018 Dan Elton,P.W. Chung Group Meeting 2 What is a Machine Learning? "Machine Learning is a field of study that gives computers the ability to learn without being explicitly programmed" - Arthur Samuel, 1959 "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with the experience E." - Tom M. Mitchell. Reinforcement learning Unsupervised learningSupervised learning • Regression • Classification Model Y = f(x) to match data (x,y) • Parametric models • Linear models • Polynomial model • Logistic model • Neural network model • Convolutional Neural network • Non parametric models • Kernel Ridge regression • Decision tree • Gaussian Process regression • Kernel SVM • Clustering • Dimensionality reduction • Autoencoders • Robotics , etc
  • 3.
    1/24/2018 Dan Elton,P.W. Chung Group Meeting 3 Supervised learning workflow Source : sci-kit-learn.org
  • 4.
    1/24/2018 Dan Elton,P.W. Chung Group Meeting 4 What is a neural network? Dendrites (input wires) Terminal axons (output wires)
  • 5.
    1/24/2018 Dan Elton,P.W. Chung Group Meeting 5 What is a neural network? Input layer hidden layer output layer are the weights Activations of layer i Input or activations from layer i-1 is the activation function
  • 6.
    1/24/2018 Dan Elton,P.W. Chung Group Meeting 6 Activation functions Binary step closest to biological neurons, but no gradient info =( Logistic/Sigmoid arctan() Rectified Linear Unit (ReLU) Maintains a nice large gradient Exponential Linear Unit (ELU)
  • 7.
    1/24/2018 Dan Elton,P.W. Chung Group Meeting 7 What is convolution? Input Output 1 dimensional convolution with the filter aka “kernel” Convolution with stride = 2
  • 8.
    1/24/2018 Dan Elton,P.W. Chung Group Meeting 8 What is convolution? Source: http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution “Feature map” 2 dimensional convolution with the 2x2 filter Note that the edges were lost. There are ways to prevent this, such as padding the edges with zeros. 1 0 1 0 1 0 1 0 1
  • 9.
    1/24/2018 Dan Elton,P.W. Chung Group Meeting 9 What are convolutional neural nets? By most accounts the CNN was invented by Yan Lecun . He developed the “LeNet” in 1998 for at ATT’s Bell Laboratories for reading digits. Architecture of LeNet:
  • 10.
    1/24/2018 Dan Elton,P.W. Chung Group Meeting 10 What are convolutional neural nets? “2D” images are actually 3D, because they have 3 color channels. A 3D diagram conveys best what a CNN actually does. The depth of the non-input layers is the # of filters. Typically the # of filters in each successive layer increases while the size of the filters decreases:
  • 11.
    1/24/2018 Dan Elton,P.W. Chung Group Meeting 11 What are convolutional neural nets? By many accounts the current deep learning boom began when Krizhevsky, Sutskever and Hinton used a CNN to win the 2010 ImageNet image classification competition. The resulting publication has 13,000+ citations. A Krizhevsky, I Sutskever and GE Hinton Imagenet classification with deep convolutional neural networks Advances in neural information processing systems, 1097-1105 (2012) Architecture they used , it has 60 million parameters and 650,000 neurons
  • 12.
    Why do CNNswork so well? They learn a hierarchical set of features the same way the mammalian visual cortex does! Dan Elton, P.W. Chung Group Meeting1/24/2018 12 Hubel & Wiesel, 1959 Receptive fields of single neurons in the cat’s striate cortex Slide from Yan LeCun
  • 13.
    1/24/2018 Dan Elton,P.W. Chung Group Meeting 13 What is an autoencoder? • The “latent space” is also called the “low dimensional manifold”, “compressed representation”, or “thought vector” • See “Decoding the Thought Vector” for amazing examples of how faces are compressed: http://gabgoh.github.io/ThoughtVectors/ Source: keras blog
  • 14.
    1/24/2018 Dan Elton,P.W. Chung Group Meeting 14 What is a variational autoencoder? • During training, the output is sampled from the enforced distribution as mean + random_noise * variance, during testing the output is the mean. • Minimize Kullback–Leibler divergence D.P. Kingma, M. Welling Auto-Encoding Variational Bayes The International Conference on Learning Representations (ICLR), Banff, 2014 [arXiv preprint].
  • 15.
    1/24/2018 Dan Elton,P.W. Chung Group Meeting 15 What are recursive neural networks? Recursive Neural Networks (RNNs) have loops. The simplest RNN is shown on the left, it contains one feedback loop The mathematics and calculation of gradients (ie backpropagation) can be made isomorphic to that of a feed-forward neural network via time unrolling Output we are interested in inputs All of these beautiful figures are taken from http://colah.github.io/posts/2015-08- Understanding-LSTMs/ Copyright by Christopher Olah.
  • 16.
    1/24/2018 Dan Elton,P.W. Chung Group Meeting 16 What are recursive neural networks? Ex.: video classification: Inputs all frames video, output a classification for each frame Ex.: translation: input Spanish, output English Ex.: sentiment analysis: Input text, output positive or negative sentiment Ex.: image captioning: Input image, output sequence of words. RNNs can be run many different ways….. “seq2seq”
  • 17.
    1/24/2018 Dan Elton,P.W. Chung Group Meeting 17 What is a gated recurrent unit? RNNs have trouble capturing long range decencies Suppose we need the output at time t+1 to depend on x0, x1, which happened in the distant past of the input stream. Technically this is called the vanishing gradient problem – the dependence (gradient) becomes exponentially small with the number of layers it has to pass through. There is also an exploding gradient problem, where the gradient increases exponetially .
  • 18.
    1/24/2018 Dan Elton,P.W. Chung Group Meeting 18 What is an LSTM? Sepp Hochreiter & Jurgen Schmidhuber (right) invented the Long Short Term Memory (LSTM) unit in 1997 to solve the vanishing gradient problem. LSTMs were recently used by Google for human-level accuracy machine translation. Apple uses LSTMs in Siri, etc etc. The LSTM looks complicated but it is actually based on an extremely simple idea – add a memory cell: Output state
  • 19.
    1/24/2018 Dan Elton,P.W. Chung Group Meeting 19 How does an LSTM work? “forget” gate “input” gate read out gate tanh()sigmoid/logistic
  • 20.
    1/24/2018 Dan Elton,P.W. Chung Group Meeting 20 LSTM vs. Gated Recurrent Unit (GRU) The GRU unit1 makes major changes to the LSTM: • Output and memory cells are merged • “forget” and “input” gates are merged into a single “update” gate • Performance is similar to LSTM3 or slightly better2,4 but with less free parameters: (6 vs 12 for a 1D input/output) 1. Cho, Kyunghyun, van Merrienboer, Bart, Gulcehre, Caglar, Bougares, Fethi, Schwenk, Holger, and Bengio, Yoshua. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. (2014) arXiv:1406.1078 2. Jozefowicz et al. An Empirical Exploration of Recurrent Network Architectures, Proceedings of the 32nd International Conference on Machine Learning, 2015 3. Klaus Greff, Rupesh Kumar Srivastava, Jan Koutník, Bas R. Steunebrink, Jürgen Schmidhuber, LSTM: A Search Space Odyssey (2015) arXiv:1503.04069 4. Chung, Junyoung, Gulcehre, Caglar, Cho, KyungHyun, and Bengio, Yoshua. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. (2014) arXiv:1412.3555 If the dimensionality of the input is n and the dimensionality of the output is d, then #of parameters LSTM 4*d*(n+d+1) GRU 3*d*(n+d)
  • 21.
    1/24/2018 Dan Elton,P.W. Chung Group Meeting 21 What are SMILES strings? SMILES (simplified molecular-input line-entry system) encode 2D molecular graphs into 1D. Example CC(=O)NCCC1=CNc2c1cc(OC)cc2 CN1CCC[C@H]1C2=CN=CC=C2 The only ambiguity in SMILES strings: • They do not capture 3D structure. However for small molecules and most application areas this doesn’t matter much as molecules generally only have one conformation, so it is implicitly contained. It only would matter in something like proteins, which might fold into more than one conformation, or if the molecules are interacting with something like an interface. FC(F)FCCC(=O)O
  • 22.
    1/24/2018 Dan Elton,P.W. Chung Group Meeting 22 One-hot encoding There are 35 characters (C, N, O, @, -, =, etc) The maximal molecule length is 120, molecules shorter than this are padded with 0s
  • 23.
    3 1 DimensionalConvolution Layers. Gated recurrent unit (GRU) layers with 501 element memory cells ”time distributed dense layer” (a separate dense layer applied to each timestep “flattening” – reshapes a 2D array to a 1D array Two dense (fully connected) neural network layers, with 435 and 292 neurons, respectively latent layer: mean and standard deviation units Custom layer to sample the Gaussian distributions during training Overall auto encoder architecture 1/24/2018 Dan Elton, P.W. Chung Group Meeting 23 Architecture Dense (fully connected) neural network layer, 292 neurons one-hot inputs 9 convolution filters of length 9 9 convolution filters of length 9 11 convolution filters of length 10
  • 24.
    1/24/2018 Dan Elton,P.W. Chung Group Meeting 24 How does one determine architecture? The JSON file for the molecular autoencoder reveals about 200+ hyperparameters. The most important are: • Number of layers • Types of layers • Size and # of filters in CNN layers • # of hidden cells in GRU layers (also called # of units) • Number of latent variables There are various ways of regularizing that can be turned on in several or all layers: • L1/ L2 weight regularization • Weight sharing • Dropout (currently most popular)
  • 25.
    1/24/2018 Dan Elton,P.W. Chung Group Meeting 25 How does one determine architecture? 1. This Week in Machine Learning (TWiML) Podcast, interview with Matthew Zeiler and others. 2. J Snoek, H Larochelle, RP Adams, Practical bayesian optimization of machine learning algorithms Advances in neural information processing systems, 2951-2959 (2012) 3. Google Research Blog: Using Machine Learning to Explore Neural Network Architecture 4 Sean C. Smithson, Guang Yang, Warren J. Gross, Brett H. Meyer Neural Networks Designing Neural Networks: Multi-Objective Hyper-Parameter Optimization arXiv:1611.02120 • Historically, design for deep networks has been a black art. This is part of the reason deep learning jobs have such high salaries.1 There are many heuristics but no overarching theory guiding design yet. • Bayesian Optimization is one approach 2 • People at Google use reinforcement learning and genetic algorithms to design complex deep networks, like the GoogleNet shown above, which can create designs that perform as well as from human designers. 3 • People have even used neural networks to design neural nets. 4
  • 26.
    1/24/2018 Dan Elton,P.W. Chung Group Meeting 26 Latent space projection into 2D via t-SNE 250,000 commercially available drug-like molecules from the ZINC database 150,000 Organic LED molecules, combinatorically generated1 1.) Rafael Gómez-Bombarelli et al. “Design of efficient molecular organic light-emitting diodes by a high- throughput virtual screening and experimental approach”. In: Nat. Mater. 15 pp. 1120–1127 (2016)
  • 27.
    Data sets thatare available 1/24/2018 Dan Elton, P.W. Chung Group Meeting 27 Name Description # of molecules Size GDB-17-Set (50 million) http://gdb.unibe.ch/downloads/ 50,000,000 GBD-13 C-N molecules GBD-13 C-N-O molecules ZINC database zinc.docking.org Commercially available molecules 22,724,825
  • 28.
    1/24/2018 Dan Elton,P.W. Chung Group Meeting 28 Adversarially trained autoencoder 1. Goodfellow, Ian J.; Pouget-Abadie, Jean; Mirza, Mehdi; Xu, Bing; Warde-Farley, David; Ozair, Sherjil; Courville, Aaron; Bengio, Yoshua (2014) "Generative Adversarial Networks". arXiv:1406.2661 2. A. Makhzani, J. Shlens, N. Jaitly, and I. Goodfellow, in International Conference on Learning Representations, (2016), arxiv.org:1511.05644 3. Kadurin, Artur et al. “The Cornucopia of Meaningful Leads: Applying Deep Adversarial Autoencoders for New Molecule Development in Oncology.” Oncotarget 8.7 (2017): 10883–10890. PMC. Web. 2 Aug. 2017. Generative adversarial networks1 (GANs) have exploded in popularity since 2014. Adversarial autoencoders2 (AAE) apply the GAN framework to variational autoencoder training. The adversarial autoencoder is an autoencoder that is regularized by matching the aggregated posterior , q(z) derived from the data distribution, to an arbitrary prior, p(z). Here p(z) is a ”the Normal distribution N(5,1)” Application to oncology molecular lead discovery (2017)3
  • 29.
    1/24/2018 Dan Elton,P.W. Chung Group Meeting 29 “Molecular Tinder” for screening OLED molecules From Aspuru-Guzik group: http://chimad.northwestern.edu/docs/DDD_WS_II/12_Aspuru_Guzik.p
  • 30.
    1/24/2018 Dan Elton,P.W. Chung Group Meeting 30 Teacher forcing