The document presents an ensemble model for chunking natural language text that combines a transformer model (RoBERTa) with a bidirectional LSTM and CNN model. The authors train these models on common chunking datasets like CoNLL 2000 and English Penn Treebank. They find that by using an ensemble of the transformer and RNN-CNN models, which compensate for each other's weaknesses, they are able to achieve state-of-the-art results on chunking, with an F1 score of 97.3% on CoNLL 2000, exceeding previous work. The transformer model provides attention-based contextual embeddings while the RNN-CNN model uses custom embeddings including POS tags to improve accuracy on tags that the transformer model struggles with.