© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark
GluonNLP
A Natural Language Processing toolkit
gluon-nlp.mxnet.io
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark
2
Three common myths …
Motivations for GluonNLP
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark
• I will write clean and reusable code
when I’m prototyping this time.
• Variant:
• - I will write clean and reusable code
next time.
=> Well crafted reusable APIs
Common myth 1
function &
script?
hard-coded
parameter?
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark
Common myth 2
• My code will still run next year.
• Sometimes, it’s not our fault.
=> Integrated testing of examples
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark
Common myth3
• I will finish setting up the baseline
model this afternoon.
• Though it may not be our fault
again.
=> Re-implementation of SOTA results
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark
Goals
1. Problem: prototype code is not reusable without copying.
Solution: carefully designed API for versatile needs.
2. Problem: code may break due to API changes.
Solution: integrated testing for examples.
3. Problem: setting up baseline for NLP tasks is hard.
Solution: implementation for state-of-the-art models.
6
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark
• Designed for engineers and researchers
• Enable fast prototyping for NLP application and research
7
GluonNLP goals
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark
GluonNLP Community
• Internal users
• Amazon Comprehend
• Amazon Lex
• AmazonTranscribe
• AmazonTranslate
• Amazon Personalize
• Alexa NLU
• Alexa Brain
• External users
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark
• High-level packages
• gluonnlp.data, gluonnlp.model, gluonnlp.embedding
• Low-Level packages
• gluonnlp.data.batchify, gluonnlp.model.StandardRNN
• Datasets:
• gluonnlp.data.SQuAD, gluonnlp.data.WikiText103
Designed for practitioners: researchers and engineers
http://gluon-nlp.mxnet.io/api/modules/data.html#public-datasets
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark
GluonNLP Models
• Language Modeling
• MachineTranslation
• Word Embedding (100+)
• Text Classification
• Text Generation
• Sentence Embedding
1
0
• Dependency Parsing
• Entailment
• Question Answering
• Named Entity Recognition
• Keyphrase Extraction
• Semantic Role Labeling
• Summarization
Released
WIP
Planned
APIs: Data Loading: Bucketing
How to generate the mini-batches?
No Bucketing + Directly Pad the Samples
Average Padding = 11.7
Be Frugal! Use Bucketing.
Sorted Bucketing
Average Padding = 3.7
Fixed Bucketing
Average Padding = 1.7
Shorter sequences can have larger batch sizes.
Fixed Bucketing + Length-aware Batch Size
Batch Size = 18Batch Size = 11
Average Padding = 1.8
Better throughput! ✌️
Batch Size = 8
ratio
Length of the buckets
Improvement over published results
AWD [1] model on WikiText2 Test Perplexity
GluonNLP 66.9 (250 epochs)
Pytorch 67.8 (250 epochs)
Diff -0.9
Table 3: AWD Language Model
Table 1: fastText n-gram embedding scores, trained onText8 dataset, evaluated on Wordsim353
Table 2: Machine Translation Model BLEU score same standard and settings
MachineTranslation
Encoder: Bidirectional
LSTM + Residual
Decoder: LSTM + Residual +
MLP Attention
• GluonNLP:
• BLEU 26.22 on
IWSLT2015, 10 epochs,
Beam Size=10
• Tensorflow/nmt:
• BLEU 26.10 on
IWSLT2015,
Beam Size=10
Wu, Yonghui, et al. "Google's neural machine translation system: Bridging the gap between human and machine translation." arXiv preprint arXiv:1609.08144 (2016).
Google Neural MachineTranslation (GNMT)
• Encoder
• 6 layers of self-attention+feed-forward
• Decoder
• 6 layers of masked self-attention and
output of encoder + feed-forward
• GluonNLP:
• BLEU 26.81 onWMT2014en_de, 40 epochs
• Tensorflow/t2t:
• BLEU 26.55 onWMT2014en_de
Vaswani, Ashish, et al. "Attention is all you need." Advances in Neural Information Processing Systems. 2017.
MachineTranslation
Transformer
• Feature-based approach
• Pre-training bidirectional
language model
• Character embedding +
stacked bidirectional LSTMs
• GluonNLPTutorial
Transfer learning: ELMo
Embedding from Language Model
Deep contextualized word representations, Peters et al., 2018
• Fine-tuning approach
• Pre-training masked language model +
next sentence prediction
• Stacked transformer encoder + BPE
• GluonNLPTutorial
Transfer Learning: BERT
Bidirectional Encoder Representations fromTransformers
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Devlin et al., 2018
BERT model zoo
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark
Go build!
• http://gluon-nlp.mxnet.io/
Get help:
• https://discuss.mxnet.io/

Introduction to GluonNLP

  • 1.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark GluonNLP A Natural Language Processing toolkit gluon-nlp.mxnet.io
  • 2.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark 2 Three common myths … Motivations for GluonNLP
  • 3.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark • I will write clean and reusable code when I’m prototyping this time. • Variant: • - I will write clean and reusable code next time. => Well crafted reusable APIs Common myth 1 function & script? hard-coded parameter?
  • 4.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Common myth 2 • My code will still run next year. • Sometimes, it’s not our fault. => Integrated testing of examples
  • 5.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Common myth3 • I will finish setting up the baseline model this afternoon. • Though it may not be our fault again. => Re-implementation of SOTA results
  • 6.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Goals 1. Problem: prototype code is not reusable without copying. Solution: carefully designed API for versatile needs. 2. Problem: code may break due to API changes. Solution: integrated testing for examples. 3. Problem: setting up baseline for NLP tasks is hard. Solution: implementation for state-of-the-art models. 6
  • 7.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark • Designed for engineers and researchers • Enable fast prototyping for NLP application and research 7 GluonNLP goals
  • 8.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark GluonNLP Community • Internal users • Amazon Comprehend • Amazon Lex • AmazonTranscribe • AmazonTranslate • Amazon Personalize • Alexa NLU • Alexa Brain • External users
  • 9.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark • High-level packages • gluonnlp.data, gluonnlp.model, gluonnlp.embedding • Low-Level packages • gluonnlp.data.batchify, gluonnlp.model.StandardRNN • Datasets: • gluonnlp.data.SQuAD, gluonnlp.data.WikiText103 Designed for practitioners: researchers and engineers http://gluon-nlp.mxnet.io/api/modules/data.html#public-datasets
  • 10.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark GluonNLP Models • Language Modeling • MachineTranslation • Word Embedding (100+) • Text Classification • Text Generation • Sentence Embedding 1 0 • Dependency Parsing • Entailment • Question Answering • Named Entity Recognition • Keyphrase Extraction • Semantic Role Labeling • Summarization Released WIP Planned
  • 11.
    APIs: Data Loading:Bucketing How to generate the mini-batches?
  • 12.
    No Bucketing +Directly Pad the Samples Average Padding = 11.7 Be Frugal! Use Bucketing.
  • 13.
  • 14.
    Fixed Bucketing Average Padding= 1.7 Shorter sequences can have larger batch sizes.
  • 15.
    Fixed Bucketing +Length-aware Batch Size Batch Size = 18Batch Size = 11 Average Padding = 1.8 Better throughput! ✌️ Batch Size = 8 ratio Length of the buckets
  • 16.
    Improvement over publishedresults AWD [1] model on WikiText2 Test Perplexity GluonNLP 66.9 (250 epochs) Pytorch 67.8 (250 epochs) Diff -0.9 Table 3: AWD Language Model Table 1: fastText n-gram embedding scores, trained onText8 dataset, evaluated on Wordsim353 Table 2: Machine Translation Model BLEU score same standard and settings
  • 17.
    MachineTranslation Encoder: Bidirectional LSTM +Residual Decoder: LSTM + Residual + MLP Attention • GluonNLP: • BLEU 26.22 on IWSLT2015, 10 epochs, Beam Size=10 • Tensorflow/nmt: • BLEU 26.10 on IWSLT2015, Beam Size=10 Wu, Yonghui, et al. "Google's neural machine translation system: Bridging the gap between human and machine translation." arXiv preprint arXiv:1609.08144 (2016). Google Neural MachineTranslation (GNMT)
  • 18.
    • Encoder • 6layers of self-attention+feed-forward • Decoder • 6 layers of masked self-attention and output of encoder + feed-forward • GluonNLP: • BLEU 26.81 onWMT2014en_de, 40 epochs • Tensorflow/t2t: • BLEU 26.55 onWMT2014en_de Vaswani, Ashish, et al. "Attention is all you need." Advances in Neural Information Processing Systems. 2017. MachineTranslation Transformer
  • 19.
    • Feature-based approach •Pre-training bidirectional language model • Character embedding + stacked bidirectional LSTMs • GluonNLPTutorial Transfer learning: ELMo Embedding from Language Model Deep contextualized word representations, Peters et al., 2018
  • 20.
    • Fine-tuning approach •Pre-training masked language model + next sentence prediction • Stacked transformer encoder + BPE • GluonNLPTutorial Transfer Learning: BERT Bidirectional Encoder Representations fromTransformers BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Devlin et al., 2018
  • 21.
  • 22.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Go build! • http://gluon-nlp.mxnet.io/ Get help: • https://discuss.mxnet.io/

Editor's Notes

  • #2 First call deck for a high level introduction to Apache MXNet.
  • #8 this is the core value proposition of GluonNLP. SOTA and reproducing scripts for baselines. APIs that reduce implementation complexity. tutorials to get people started in NLP. provides dynamic-graph workload. motivation for static mem for Gluon, dynamic graph optimization, round-up GPU memory pool
  • #17 over 300 pre-trained word embeddings intrinsic evaluation tools and datasets, embedding training transformer 13.36 no static, 59.02 static