What is BERT?
BERT is a state-of-the-art NLP method trained on a very large dataset of texts—namely, the entirety of English-language Wikipedia (2,500 million words) and a corpus of English-language books (800 million words). Thanks to this large amount of training data and its unique neural network architecture, BERT—–and subsequent methods like it (e.g., GPT-2)–—can understand human language significantly better than previous NLP methods. For example, BERT can identify whether a sentence expresses positive or negative sentiment, predict what sentence should come next in a paragraph, and disambiguate between multivalent words with never-before-seen levels of accuracy.