Annotated Bibliography

This is a crowdsourced annotated bibliography of research and resources related to BERT-like models.

If you’d like to add to the bibliography, you can do so in this Dropbox document. We will update the bibliography on this web page periodically.

Technical Readings

BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding by Jacob Devlin, Ming-Wei Chan, Kenton Lee, and Kristina Toutonova (2018)
- Original paper that introduced BERT, authored by Google AI developers
Contextual Embeddings: When are They Worth It? by Simran Arora, Avner May, Jian Zhang, Christopher Ré (2020)
- This paper compares the performance of contextual word embeddings (e.g. BERT) to static embeddings (e.g. GloVe) and describes when using contextual embeddings leads to large performance increases.
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter by Victor Sanh, Lysandre Debut, Julien Chaumond and Thomas Wolf (2020)
- Helpful for teaching students how to use BERT-like models without extensive computational resources
A Primer in BERTology: What We Know About How BERT Works by Anna Rogers, Olga Kovaleva and Anna Rumshisky (2020)
- A survey of 150+ studies of BERT that explores what BERT “knows” and how it might be improved. Very technical and invested in model architecture.

Tutorials & Primers

Transformer: A Novel Neural Network Architecture for Language Understanding by Jakob Uszkoreit (August 2017)
- An introductory description of the transformer architecture.
The Illustrated Transformer by Jay Alammar (June 2017)
- A helpful, but technical, dive into the transformer architecture.
Neural machine translation with a Transformer and Keras
- A detailed description of the transformer architecture with corresponding tensorflow code.
A Gentle Introduction to Transfer Learning for Deep Learning by Jason Brownlee (September 2019)
- A beginner’s guide to transfer learning. Includes links to other sources and some examples.
**[NLP for Developers: Transfer Learning Rasa](https://www.youtube.com/watch?v=hJ1hzEJE16c)** by Rasa (December 2020)
- A very accessible ~7 minute video introduction to transfer learning.
The Illustrated BERT, ELMo, and Co. (How NLP Cracked Transfer Learning) by Jay Alammar (December 2018)
- Helpful but very technical for a humanities audience.
Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer by Adam Roberts and Colin Raffel (February 2020)
- A somewhat technical summary of the paper introducing the T5 model, Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.
How GPT-3 Works - Visualizations and Animations by Jay Alammar (July 2020)
- A helpful, slightly technical description of the GPT-3 model.

Risks & Ethical Concerns

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 by Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell (2021)
- This paper discusses the risks and ethical concerns of large language models like BERT, including biased and poorly documented training data as well as financial and environmental costs.
Extracting Data From Large Language Models by Nicholas Carlini et al. (December 2020)
- This paper demonstrates that personal data can be extracted by an adversary from LLMs whose training data contains that information.
Privacy Considerations in Large Language Models by Nicholas Carlini (December 2020)
- A blog post describing the results of Extracting Data From Large Language Models (above) in a more approachable manner.

Applied Humanities

Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4 by Kent Chang, Mackenzie Cramer, Sandeep Soni and David Bamman (2023) [Code]
- This paper presents a task for determining what novels a LLM has memorized and uses it to assess which and what kinds of books have been memorized by GPT models.
Do Humanists Need BERT? by Ted Underwood (July 2019)
- Overview of BERT and an assessment of its usefulness when applied to sentiment analysis of movie reviews and genre classification of books.
Literary Event Detection by Matthew Sims, Jong Ho Park, and David Bamman (2019)
- This paper releases an annotated dataset of events in literature and evaluates several models on their prediction abilities.
An Annotated Dataset of Coreference in English Literature by David Bamman, Olivia Lewke, and Anya Mansoor (2020)
- This paper releases a dataset of coreference annotations in English literature texts, a valuable resource for training and evaluating literary coreference systems.
Latin BERT: A Contextual Language Model for Classical Philology by David Bamman and Patrick Burns (2022)
- This paper presents a version of BERT for Latin.
Adapting vs. Pre-Training Language Models for Historical Languages by Enrique Manjavacas and Lauren Fonteyn (2022) [Models]
- This paper assesses whether it is more effective to adapt pre-existing LLMs for use with Historical English or train new models from scratch and releases the best performing model, MacBERTh.
Unsupervised Domain Adaptation of Contextualized Embeddings for Labeling by Xiaochuang Han and Jacob Eisenstein (2019)
- Domain adaptive fine-tuning on Early Modern English and Twitter.
What about Grammar? Using BERT Embeddings to Explore Functional-Semantic Shifts of Semi-Lexical and Grammatical Constructions by Lauren Fonteyn (2020)
- This paper uses BERT embeddings to detect shifts in word usage.

Critical Humanities

Playing With Unicorns: AI Dungeon and Citizen NLP by Minh Hua and Rita Raley (2020)
- This paper explores what AI-human collaborations could and should look like by exploring the indie text adventure game AI Dungeon 2.

Tools

transformers from HuggingFace
- An API for accessing and training ML models. Includes access to popular models such as BERT, T5, LLaMA, and much more.
Easy-Bert by Rob Rua
- A simple API for accessing BERT.
CLIP-as-Service from Jina AI
- A service for easy image and text embedding.

Educational Resources

Using BERT for next sentence prediction by Ted Underwood (adapted and used in Dan Sinykin’s Emory course “Practical Approaches to Data Science with Text” in 2020)
- A notebook for teaching students about BERT.