Neural Architectures for Named
Entity Recognition
GUILLAUME LAMPLE, MIGUEL BALLESTEROS, SANDEEP SUBRAMANIAN,
KAZUYA KAWAKAMI AND CHRIS DYER
NAACL-HTL 2016
Name Entity Recognition
• Task of identifying proper names in text and classifying into set of predefined
categories of interest
• Three universally accepted categories:
- Person, location and organization
• Other common tasks:
- Recognition of date, time, email address, measures etc.
• Other domain specific entities:
- Names of drugs, genes, bibliographic references etc.
2
Name Entity Recognition
• Example:
Lady Gaga is playing a concert for the Bushes in Texas next September
Person Person Location Time
• Why NER?
- Machine Translation
- Question Answering
- Information retrieval
- Text-to-speech
3
Challenges
• Small amount of supervised training data
- Language specific features and knowledge resources are required
- Costly to develop in new languages or domain
• Unsupervised learning offers an alternative
- Existing systems [1,2] rely on unsupervised features to augment hand-
engineered features
4
Solution
• Neural architectures
- No language-specific resources or features
- A small amount of supervised training data and unlabeled corpora
• Two models
1. A bidirectional LSTM with a sequential conditional random layer (CRF) above
it : LSTM-CRF
2. Transition based chucking model using stack LSTMs : S-LSTM
5
Intuitions
• Names often consists of multiple tokens
- LSTM-CRF : Captures dependencies across labels
- S-LSTM : Consturcts labelled chucks of input sequence directly
• Token evidence for being a name includes both orthographic evidence and
distributional evidence
- Orthographic representation – Character-based word representation
- Distributional representation – Word embedding
6
Word Representation
• Character based representation + Word embedding
• Character based model
- Proposed by Ling et al. [3]
- Randomly initialized character embedding matrix
- Bidirectional LSTM captures orthographic information of a word
• Word embedding
- Pretrained embedding using skip-ngram [4]
• Dropout to encourage the model to depend on both representation
7
Word Representation
8
100 dimension vector 50 (25 + 25) dimension vector+
Dropout
Why not a CNN?
• Existing approaches [5, 6] use CNN for character based word representation
• CNN is designed to discover position-invariant features
• In a word, important information are position dependent
- e.g. Prefix and Suffix
9
LSTM-CRF Model
• Bidirectional LSTM transforms a word representation to context representation
10
it = σ(Wi[xt, ht-1, ct-1] + bi)
ot = σ(Wo[xt, ht-1, ct-1] + bo)
ct = (1- it)*ct-1 + it*tanh(Wc[xt, ht-1, ct-1] + bc)
ht = ot*tanh(ct)
h't = [ht ; ht]
X +
X
-1
X
σ σtanh
ct-1
ht-1
ct
ht
xt
ht
it
ot
LSTM-CRF Model
• Output of LSTM is projected onto a hidden layer
P = Wph't
• P is of size of n x k
- k : number of distinct tags
- Pij : score of the jth tag of ith word
• P is input to the next layer - CRF
11
LSTM-CRF Model
12
LSTM-CRF Model
13
LSTM-CRF Model
• y* and summation of scores of all possible sequences are computed using
dynamic programming
• log-probability of the correct sequences are maximized during training
• Example:
14
Tagging scheme
• General tagging scheme – IOB format
- B-label : Beginning of a named entity
- I-label : Inside a named entity
- O-label : Outside a named entity
• Dai et al. [7] have showed that more expressive tagging scheme improves the
performance
• No significant performance improvement is observed with IOBES
- S : Singleton entities
- E : End of named entities
15
Stack LSTM
• Proposed by Dyer at al. [8] for dependency parsing
• LSTM is augmented with a stack pointer
- Output cell of stack point gives stack summary
- Used to deicde ct-1 and ht-1 for the new input
• Stack operations are simulated using the pointer
- Push : Add a new input to LSTM
- Pop : Moves stack pointer to previous element
16
Stack LSTM
• Example:
• Contents are never overwritten
17
Transition-Based Chunking Model
• Directly constructs representation of multi-token names
• Uses three stack LSTMs :
1. output : Contains completed chunks
2. stack : Contains partially completed chunks
3. buffer : Keeps words that have yet to be processed
18
Transition-Based Chunking Model
• Three actions
1. SHIFT : Moves a word from buffer to stack
2. OUT: Moves a word from buffer to output
3. REDUCE(y): Moves stack content to output with the label y
• Algorithm stops when buffer and stack are empty
19
Transition-Based Chunking Model
• Example: Sequence of operations required to process the sentence Mark
Watney visited Mars
20
Transition-Based Chunking Model
• Probability distribution over possible actions at each time step is computed
using,
1. Current content of the stack, buffer and output
2. History of actions taken
• Maximum probability action is chosen greedily
• May not guaranteed to find a global optimum
21
Transition-Based Chunking Model
22
Transition-Based Chunking Model
23
Novelty
• LSTM-CRF
- Word representation proposed by Ling et al. [3] has been experimented
for language independent NER
• Stack-LSTM
- Stack LSTM Dyer at al. [8] has been experimented for language independent
NER
24
Evaluations
• Two datasets:
- ConLL 2002 and ConLL 2003
• Four languages:
- English, Spanish, German and Dutch
• Four types of names entities:
- Person, Location, Organization and Miscellaneous
25
Evaluations
Model F1
Lin and Wu (2009) 83.78
Passos et al. (2014) 90.05
Chiu and Nichols (2015) 90.69
LSTM-CRF 90.94
S-LSTM 90.33
26
English NER results (CoNLL-2003 test set) compared with models
trained with no external labeled data
Evaluations
Model F1
Collobert et al. (2011) 89.59
Lin and Wu (2009) 90.90
Huang et al. (2015) 90.10
Luo et al. (2015) + gaz 89.9
Luo et al. (2015) + gaz + linking 91.2
Passos et al. (2014) 90.90
Chiu and Nichols (2015) 90.77
LSTM-CRF 90.94
S-LSTM 90.33
27
English NER results (CoNLL-2003 test set) compared with models trained
with external labeled data
Evaluations
Model F1
Florian et al. (2003)* 72.41
Ando and Zhang (2005a) 75.27
Qi et al. (2009) 75.72
Gillick et al. (2015) 72.08
Gillick et al. (2015) * 76.22
LSTM-CRF 78.76
S-LSTM 75.66
28
German NER results (CoNLL-2003 test set)
Evaluations
Model F1
Carreras et al. (2002) 77.05
Nothman et al. (2013) 78.6
Gillick et al. (2015) 78.08
Gillick et al. (2015) * 82.84
LSTM-CRF 81.74
S-LSTM 79.88
29
Dutch NER results (CoNLL-2003 test set)
Evaluations
Model F1
Carreras et al. (2002) 81.39
Santos and Guimaraes (2015) 82.21
Gillick et al. (2015) 81.83
Gillick et al. (2015) * 82.95
LSTM-CRF 85.75
S-LSTM 83.93
30
Spanish NER results (CoNLL-2003 test set)
Evaluations
31
CoNLL-2002 and CoNLL-2003 test set results
Evaluations
Model Variant F1
LSTM char + dropout + pretrain 89.15
LSTM-CRF char + dropout 83.63
LSTM-CRF pretrain 88.39
LSTM-CRF char + pretrain 89.77
LSTM-CRF dropout + pretrain 90.20
LSTM-CRF char + dropout + pretrain 90.94
32
English NER results for variation of LSTM-CRF
Evaluations
Model Variant F1
S-LSTM char + dropout 80.88
S-LSTM pretrain 86.67
S-LSTM char + pretrain 89.32
S-LSTM dropout + pretrain 87.96
S-LSTM char + dropout + pretrain 90.33
33
English NER results for variation of S-LSTM
Evaluations
• LSTM-CRF model archives state-of-art performance in German and Spanish
• LSTM-CRF model outperforms all the existings model which do not use any
external labeled data
• Stack-LSTM model is more dependent on character-based representation
compare to LSTM-CRF
• Dropout on word representation layer significantly improves the performance
34
Predictions of LSTM-CRF
• Some correct predictions
Brokers__O said__O blue__O chips__O like__O IDLC__B-ORG ,__O Bangladesh__B-ORG
Lamps__I-ORG ,__O Chittagong__B-ORG Cement__I-ORG and__O Atlas__B-ORG
Bangladesh__I-ORG were__O expected__O to__O rise.__O
Jones__B-ORG Medical__I-ORG completes__O acquisition__O .__O
The__O Dow__B-ORG Chemical__I-ORG Co__I-ORG of__O the__O United__B-LOC
States__I-LOC will__O invest__O $__O 4__O billion__O to__O build__O an__O
ethylene__O plant__O in__O Tianjin__B-LOC city__O in__O northern__O China__B-LOC
,__O the__O China__B-ORG Daily__I-ORG said__O on__O Saturday__O .__O
35
Predictions of LSTM-CRF
• Some bad predictions
Jordan__B-LOC defends__O his__O decision__O to__O make__O the__O film__O ,__O
whose__O screenplay__O he__O wrote__O himself__O after__O years__O of__O
research__O ,__O saying__O it__O was__O more__O about__O history__O than__O
any__O political__O statement.__O
Cofinec__B-ORG said__O Petofi__B-LOC general__O manager__O Laszlo__B-PER
Sebesvari__I-PER had__O submitted__O his__O resignation__O and__O will__O be__O
leaving__O Petofi__B-ORG but__O will__O remain__O on__O Petofi__B-ORG 's__O
board__O of__O directors.__O
36
Related Work
• Neural Architectures
1. CNN-CRF [1]
2. LSTM-CRF [9]
• Language independent NER
1. Bayesian approach [10]
• NER with character-based representation
1. Byte level processing of strings [11]
2. CNN-LSTM [12]
37
Strength
• Captures language independent nature of NER
• Achieves state-of-art in multiple languages with a simple architecture
• Reason behind each decisions made is explained clearly
38
Weakness
• No architectural wise improvement to the existing models
• Lack of explanation for S-LSTM model
39
Future Work
• Extend CRF layer to higher order CRF [13]
• Experiment the model performance by jointly learning NER and other NLP task
[14]
40
Reference
[1] Ronan Collobert, Jason Weston, Leon Bottou, Michael ´Karlen, Koray Kavukcuoglu, and Pavel Kuksa.
2011. Natural language processing (almost) from scratch. The Journal of Machine Learning Research,
12:2493–2537.
[2] Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010. Word representations: A simple and general
method for semi-supervised learning. In Proc. ACL.
[3] Wang Ling, Tiago Lu´ıs, Lu´ıs Marujo, Ramon Fernandez ´Astudillo, Silvio Amir, Chris Dyer, Alan W Black,
and Isabel Trancoso. 2015b. Finding function in form: Compositional character models for open vocabulary
word representation. In Proceedings of the Conference on Empirical Methods in Natural Language
Processing (EMNLP).
[4] Wang Ling, Lin Chu-Cheng, Yulia Tsvetkov, Silvio Amir, Ramon Fernandez Astudillo, Chris Dyer, Alan W
´Black, and Isabel Trancoso. 2015a. Not all contexts are created equal: Better word representations with
variable attention. In Proc. EMNLP.
[5] Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text
classification.
In Advances in Neural Information Processing Systems, pages 649–657.
[6] Yoon Kim, Yacine Jernite, David Sontag, and Alexander M. Rush. 2015. Character-aware neural language
models. CoRR, abs/1508.06615
[7] Hong-Jie Dai, Po-Ting Lai, Yung-Chun Chang, and Richard Tzong-Han Tsai. 2015. Enhancing of chemical
compound and drug name recognition using representative tag scheme and fine-grained tokenization.
Journal of cheminformatics, 7(Suppl 1):S14.
41
Reference
[8] Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, and Noah A. Smith. 2015. Transitionbased
dependency parsing with stack long short-term memory. In Proc. ACL
[9] Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM-CRF models for sequence tagging. CoRR,
abs/1508.01991. Yoon Kim, Yacine Jernite, David Sontag, and Alexander M. Rush. 2015. Character-aware
neural language models. CoRR, abs/1508.06615.
[10] Jacob Eisenstein, Tae Yano, William W Cohen, Noah A Smith, and Eric P Xing. 2011. Structured
databases of named entities from bayesian nonparametrics. In Proceedings of the First Workshop on
Unsupervised Learning in NLP, pages 2–12. Association for Computational Linguistics.
[11] Dan Gillick, Cliff Brunk, Oriol Vinyals, and Amarnag Subramanya. 2015. Multilingual language processing
from bytes. arXiv preprint arXiv:1512.00103
[12] Jason PC Chiu and Eric Nichols. 2015. Named entity recognition with bidirectional lstm-cnns. arXiv
preprint arXiv:1511.08308.
[13] S. Sarawagi, W. W. Cohen. Semi-Markov conditional random fields for information extraction. NIPS,
2004
[14] Gang Luo, Xiaojiang Huang, Chin-Yew Lin, and Zaiqing Nie. 2015. Joint named entity recognition and
disambiguation. In Proc. EMNLP.
42
43
Q & A
44
Thank You

Neural Architectures for Named Entity Recognition

  • 1.
    Neural Architectures forNamed Entity Recognition GUILLAUME LAMPLE, MIGUEL BALLESTEROS, SANDEEP SUBRAMANIAN, KAZUYA KAWAKAMI AND CHRIS DYER NAACL-HTL 2016
  • 2.
    Name Entity Recognition •Task of identifying proper names in text and classifying into set of predefined categories of interest • Three universally accepted categories: - Person, location and organization • Other common tasks: - Recognition of date, time, email address, measures etc. • Other domain specific entities: - Names of drugs, genes, bibliographic references etc. 2
  • 3.
    Name Entity Recognition •Example: Lady Gaga is playing a concert for the Bushes in Texas next September Person Person Location Time • Why NER? - Machine Translation - Question Answering - Information retrieval - Text-to-speech 3
  • 4.
    Challenges • Small amountof supervised training data - Language specific features and knowledge resources are required - Costly to develop in new languages or domain • Unsupervised learning offers an alternative - Existing systems [1,2] rely on unsupervised features to augment hand- engineered features 4
  • 5.
    Solution • Neural architectures -No language-specific resources or features - A small amount of supervised training data and unlabeled corpora • Two models 1. A bidirectional LSTM with a sequential conditional random layer (CRF) above it : LSTM-CRF 2. Transition based chucking model using stack LSTMs : S-LSTM 5
  • 6.
    Intuitions • Names oftenconsists of multiple tokens - LSTM-CRF : Captures dependencies across labels - S-LSTM : Consturcts labelled chucks of input sequence directly • Token evidence for being a name includes both orthographic evidence and distributional evidence - Orthographic representation – Character-based word representation - Distributional representation – Word embedding 6
  • 7.
    Word Representation • Characterbased representation + Word embedding • Character based model - Proposed by Ling et al. [3] - Randomly initialized character embedding matrix - Bidirectional LSTM captures orthographic information of a word • Word embedding - Pretrained embedding using skip-ngram [4] • Dropout to encourage the model to depend on both representation 7
  • 8.
    Word Representation 8 100 dimensionvector 50 (25 + 25) dimension vector+ Dropout
  • 9.
    Why not aCNN? • Existing approaches [5, 6] use CNN for character based word representation • CNN is designed to discover position-invariant features • In a word, important information are position dependent - e.g. Prefix and Suffix 9
  • 10.
    LSTM-CRF Model • BidirectionalLSTM transforms a word representation to context representation 10 it = σ(Wi[xt, ht-1, ct-1] + bi) ot = σ(Wo[xt, ht-1, ct-1] + bo) ct = (1- it)*ct-1 + it*tanh(Wc[xt, ht-1, ct-1] + bc) ht = ot*tanh(ct) h't = [ht ; ht] X + X -1 X σ σtanh ct-1 ht-1 ct ht xt ht it ot
  • 11.
    LSTM-CRF Model • Outputof LSTM is projected onto a hidden layer P = Wph't • P is of size of n x k - k : number of distinct tags - Pij : score of the jth tag of ith word • P is input to the next layer - CRF 11
  • 12.
  • 13.
  • 14.
    LSTM-CRF Model • y*and summation of scores of all possible sequences are computed using dynamic programming • log-probability of the correct sequences are maximized during training • Example: 14
  • 15.
    Tagging scheme • Generaltagging scheme – IOB format - B-label : Beginning of a named entity - I-label : Inside a named entity - O-label : Outside a named entity • Dai et al. [7] have showed that more expressive tagging scheme improves the performance • No significant performance improvement is observed with IOBES - S : Singleton entities - E : End of named entities 15
  • 16.
    Stack LSTM • Proposedby Dyer at al. [8] for dependency parsing • LSTM is augmented with a stack pointer - Output cell of stack point gives stack summary - Used to deicde ct-1 and ht-1 for the new input • Stack operations are simulated using the pointer - Push : Add a new input to LSTM - Pop : Moves stack pointer to previous element 16
  • 17.
    Stack LSTM • Example: •Contents are never overwritten 17
  • 18.
    Transition-Based Chunking Model •Directly constructs representation of multi-token names • Uses three stack LSTMs : 1. output : Contains completed chunks 2. stack : Contains partially completed chunks 3. buffer : Keeps words that have yet to be processed 18
  • 19.
    Transition-Based Chunking Model •Three actions 1. SHIFT : Moves a word from buffer to stack 2. OUT: Moves a word from buffer to output 3. REDUCE(y): Moves stack content to output with the label y • Algorithm stops when buffer and stack are empty 19
  • 20.
    Transition-Based Chunking Model •Example: Sequence of operations required to process the sentence Mark Watney visited Mars 20
  • 21.
    Transition-Based Chunking Model •Probability distribution over possible actions at each time step is computed using, 1. Current content of the stack, buffer and output 2. History of actions taken • Maximum probability action is chosen greedily • May not guaranteed to find a global optimum 21
  • 22.
  • 23.
  • 24.
    Novelty • LSTM-CRF - Wordrepresentation proposed by Ling et al. [3] has been experimented for language independent NER • Stack-LSTM - Stack LSTM Dyer at al. [8] has been experimented for language independent NER 24
  • 25.
    Evaluations • Two datasets: -ConLL 2002 and ConLL 2003 • Four languages: - English, Spanish, German and Dutch • Four types of names entities: - Person, Location, Organization and Miscellaneous 25
  • 26.
    Evaluations Model F1 Lin andWu (2009) 83.78 Passos et al. (2014) 90.05 Chiu and Nichols (2015) 90.69 LSTM-CRF 90.94 S-LSTM 90.33 26 English NER results (CoNLL-2003 test set) compared with models trained with no external labeled data
  • 27.
    Evaluations Model F1 Collobert etal. (2011) 89.59 Lin and Wu (2009) 90.90 Huang et al. (2015) 90.10 Luo et al. (2015) + gaz 89.9 Luo et al. (2015) + gaz + linking 91.2 Passos et al. (2014) 90.90 Chiu and Nichols (2015) 90.77 LSTM-CRF 90.94 S-LSTM 90.33 27 English NER results (CoNLL-2003 test set) compared with models trained with external labeled data
  • 28.
    Evaluations Model F1 Florian etal. (2003)* 72.41 Ando and Zhang (2005a) 75.27 Qi et al. (2009) 75.72 Gillick et al. (2015) 72.08 Gillick et al. (2015) * 76.22 LSTM-CRF 78.76 S-LSTM 75.66 28 German NER results (CoNLL-2003 test set)
  • 29.
    Evaluations Model F1 Carreras etal. (2002) 77.05 Nothman et al. (2013) 78.6 Gillick et al. (2015) 78.08 Gillick et al. (2015) * 82.84 LSTM-CRF 81.74 S-LSTM 79.88 29 Dutch NER results (CoNLL-2003 test set)
  • 30.
    Evaluations Model F1 Carreras etal. (2002) 81.39 Santos and Guimaraes (2015) 82.21 Gillick et al. (2015) 81.83 Gillick et al. (2015) * 82.95 LSTM-CRF 85.75 S-LSTM 83.93 30 Spanish NER results (CoNLL-2003 test set)
  • 31.
  • 32.
    Evaluations Model Variant F1 LSTMchar + dropout + pretrain 89.15 LSTM-CRF char + dropout 83.63 LSTM-CRF pretrain 88.39 LSTM-CRF char + pretrain 89.77 LSTM-CRF dropout + pretrain 90.20 LSTM-CRF char + dropout + pretrain 90.94 32 English NER results for variation of LSTM-CRF
  • 33.
    Evaluations Model Variant F1 S-LSTMchar + dropout 80.88 S-LSTM pretrain 86.67 S-LSTM char + pretrain 89.32 S-LSTM dropout + pretrain 87.96 S-LSTM char + dropout + pretrain 90.33 33 English NER results for variation of S-LSTM
  • 34.
    Evaluations • LSTM-CRF modelarchives state-of-art performance in German and Spanish • LSTM-CRF model outperforms all the existings model which do not use any external labeled data • Stack-LSTM model is more dependent on character-based representation compare to LSTM-CRF • Dropout on word representation layer significantly improves the performance 34
  • 35.
    Predictions of LSTM-CRF •Some correct predictions Brokers__O said__O blue__O chips__O like__O IDLC__B-ORG ,__O Bangladesh__B-ORG Lamps__I-ORG ,__O Chittagong__B-ORG Cement__I-ORG and__O Atlas__B-ORG Bangladesh__I-ORG were__O expected__O to__O rise.__O Jones__B-ORG Medical__I-ORG completes__O acquisition__O .__O The__O Dow__B-ORG Chemical__I-ORG Co__I-ORG of__O the__O United__B-LOC States__I-LOC will__O invest__O $__O 4__O billion__O to__O build__O an__O ethylene__O plant__O in__O Tianjin__B-LOC city__O in__O northern__O China__B-LOC ,__O the__O China__B-ORG Daily__I-ORG said__O on__O Saturday__O .__O 35
  • 36.
    Predictions of LSTM-CRF •Some bad predictions Jordan__B-LOC defends__O his__O decision__O to__O make__O the__O film__O ,__O whose__O screenplay__O he__O wrote__O himself__O after__O years__O of__O research__O ,__O saying__O it__O was__O more__O about__O history__O than__O any__O political__O statement.__O Cofinec__B-ORG said__O Petofi__B-LOC general__O manager__O Laszlo__B-PER Sebesvari__I-PER had__O submitted__O his__O resignation__O and__O will__O be__O leaving__O Petofi__B-ORG but__O will__O remain__O on__O Petofi__B-ORG 's__O board__O of__O directors.__O 36
  • 37.
    Related Work • NeuralArchitectures 1. CNN-CRF [1] 2. LSTM-CRF [9] • Language independent NER 1. Bayesian approach [10] • NER with character-based representation 1. Byte level processing of strings [11] 2. CNN-LSTM [12] 37
  • 38.
    Strength • Captures languageindependent nature of NER • Achieves state-of-art in multiple languages with a simple architecture • Reason behind each decisions made is explained clearly 38
  • 39.
    Weakness • No architecturalwise improvement to the existing models • Lack of explanation for S-LSTM model 39
  • 40.
    Future Work • ExtendCRF layer to higher order CRF [13] • Experiment the model performance by jointly learning NER and other NLP task [14] 40
  • 41.
    Reference [1] Ronan Collobert,Jason Weston, Leon Bottou, Michael ´Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. The Journal of Machine Learning Research, 12:2493–2537. [2] Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010. Word representations: A simple and general method for semi-supervised learning. In Proc. ACL. [3] Wang Ling, Tiago Lu´ıs, Lu´ıs Marujo, Ramon Fernandez ´Astudillo, Silvio Amir, Chris Dyer, Alan W Black, and Isabel Trancoso. 2015b. Finding function in form: Compositional character models for open vocabulary word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). [4] Wang Ling, Lin Chu-Cheng, Yulia Tsvetkov, Silvio Amir, Ramon Fernandez Astudillo, Chris Dyer, Alan W ´Black, and Isabel Trancoso. 2015a. Not all contexts are created equal: Better word representations with variable attention. In Proc. EMNLP. [5] Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Advances in Neural Information Processing Systems, pages 649–657. [6] Yoon Kim, Yacine Jernite, David Sontag, and Alexander M. Rush. 2015. Character-aware neural language models. CoRR, abs/1508.06615 [7] Hong-Jie Dai, Po-Ting Lai, Yung-Chun Chang, and Richard Tzong-Han Tsai. 2015. Enhancing of chemical compound and drug name recognition using representative tag scheme and fine-grained tokenization. Journal of cheminformatics, 7(Suppl 1):S14. 41
  • 42.
    Reference [8] Chris Dyer,Miguel Ballesteros, Wang Ling, Austin Matthews, and Noah A. Smith. 2015. Transitionbased dependency parsing with stack long short-term memory. In Proc. ACL [9] Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM-CRF models for sequence tagging. CoRR, abs/1508.01991. Yoon Kim, Yacine Jernite, David Sontag, and Alexander M. Rush. 2015. Character-aware neural language models. CoRR, abs/1508.06615. [10] Jacob Eisenstein, Tae Yano, William W Cohen, Noah A Smith, and Eric P Xing. 2011. Structured databases of named entities from bayesian nonparametrics. In Proceedings of the First Workshop on Unsupervised Learning in NLP, pages 2–12. Association for Computational Linguistics. [11] Dan Gillick, Cliff Brunk, Oriol Vinyals, and Amarnag Subramanya. 2015. Multilingual language processing from bytes. arXiv preprint arXiv:1512.00103 [12] Jason PC Chiu and Eric Nichols. 2015. Named entity recognition with bidirectional lstm-cnns. arXiv preprint arXiv:1511.08308. [13] S. Sarawagi, W. W. Cohen. Semi-Markov conditional random fields for information extraction. NIPS, 2004 [14] Gang Luo, Xiaojiang Huang, Chin-Yew Lin, and Zaiqing Nie. 2015. Joint named entity recognition and disambiguation. In Proc. EMNLP. 42
  • 43.
  • 44.