Annotated Bibliography
This is a crowdsourced annotated bibliography of research and resources related to BERT-like models.
If you’d like to add to the bibliography, you can do so in this Dropbox document. We will update the bibliography on this web page periodically.
Technical Readings
-
BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding by Jacob Devlin, Ming-Wei Chan, Kenton Lee, and Kristina Toutonova (2018)
- Original paper that introduced BERT, authored by Google AI developers
-
Contextual Embeddings: When are They Worth It? by Simran Arora, Avner May, Jian Zhang, Christopher Ré (2020)
- This paper compares the performance of contextual word embeddings (e.g. BERT) to static embeddings (e.g. GloVe) and describes when using contextual embeddings leads to large performance increases.
-
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter by Victor Sanh, Lysandre Debut, Julien Chaumond and Thomas Wolf (2020)
- Helpful for teaching students how to use BERT-like models without extensive computational resources
-
A Primer in BERTology: What We Know About How BERT Works by Anna Rogers, Olga Kovaleva and Anna Rumshisky (2020)
- A survey of 150+ studies of BERT that explores what BERT “knows” and how it might be improved. Very technical and invested in model architecture.
Tutorials & Primers
-
Transformer: A Novel Neural Network Architecture for Language Understanding by Jakob Uszkoreit (August 2017)
- An introductory description of the transformer architecture.
-
The Illustrated Transformer by Jay Alammar (June 2017)
- A helpful, but technical, dive into the transformer architecture.
-
Neural machine translation with a Transformer and Keras
- A detailed description of the transformer architecture with corresponding tensorflow code.
-
A Gentle Introduction to Transfer Learning for Deep Learning by Jason Brownlee (September 2019)
- A beginner’s guide to transfer learning. Includes links to other sources and some examples.
-
**[NLP for Developers: Transfer Learning Rasa](https://www.youtube.com/watch?v=hJ1hzEJE16c)** by Rasa (December 2020) - A very accessible ~7 minute video introduction to transfer learning.
-
The Illustrated BERT, ELMo, and Co. (How NLP Cracked Transfer Learning) by Jay Alammar (December 2018)
- Helpful but very technical for a humanities audience.
-
Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer by Adam Roberts and Colin Raffel (February 2020)
- A somewhat technical summary of the paper introducing the T5 model, Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.
-
How GPT-3 Works - Visualizations and Animations by Jay Alammar (July 2020)
- A helpful, slightly technical description of the GPT-3 model.
Risks & Ethical Concerns
-
On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 by Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell (2021)
- This paper discusses the risks and ethical concerns of large language models like BERT, including biased and poorly documented training data as well as financial and environmental costs.
-
Extracting Data From Large Language Models by Nicholas Carlini et al. (December 2020)
- This paper demonstrates that personal data can be extracted by an adversary from LLMs whose training data contains that information.
-
Privacy Considerations in Large Language Models by Nicholas Carlini (December 2020)
- A blog post describing the results of Extracting Data From Large Language Models (above) in a more approachable manner.
Applied Humanities
-
Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4 by Kent Chang, Mackenzie Cramer, Sandeep Soni and David Bamman (2023) [Code]
- This paper presents a task for determining what novels a LLM has memorized and uses it to assess which and what kinds of books have been memorized by GPT models.
-
Do Humanists Need BERT? by Ted Underwood (July 2019)
- Overview of BERT and an assessment of its usefulness when applied to sentiment analysis of movie reviews and genre classification of books.
-
Literary Event Detection by Matthew Sims, Jong Ho Park, and David Bamman (2019)
- This paper releases an annotated dataset of events in literature and evaluates several models on their prediction abilities.
-
An Annotated Dataset of Coreference in English Literature by David Bamman, Olivia Lewke, and Anya Mansoor (2020)
- This paper releases a dataset of coreference annotations in English literature texts, a valuable resource for training and evaluating literary coreference systems.
-
Latin BERT: A Contextual Language Model for Classical Philology by David Bamman and Patrick Burns (2022)
- This paper presents a version of BERT for Latin.
-
Adapting vs. Pre-Training Language Models for Historical Languages by Enrique Manjavacas and Lauren Fonteyn (2022) [Models]
- This paper assesses whether it is more effective to adapt pre-existing LLMs for use with Historical English or train new models from scratch and releases the best performing model, MacBERTh.
-
Unsupervised Domain Adaptation of Contextualized Embeddings for Labeling by Xiaochuang Han and Jacob Eisenstein (2019)
- Domain adaptive fine-tuning on Early Modern English and Twitter.
-
What about Grammar? Using BERT Embeddings to Explore Functional-Semantic Shifts of Semi-Lexical and Grammatical Constructions by Lauren Fonteyn (2020)
- This paper uses BERT embeddings to detect shifts in word usage.
Critical Humanities
-
Playing With Unicorns: AI Dungeon and Citizen NLP by Minh Hua and Rita Raley (2020)
- This paper explores what AI-human collaborations could and should look like by exploring the indie text adventure game AI Dungeon 2.
Tools
-
transformers from HuggingFace
-
Easy-Bert by Rob Rua
- A simple API for accessing BERT.
-
CLIP-as-Service from Jina AI
- A service for easy image and text embedding.
Educational Resources
-
Using BERT for next sentence prediction by Ted Underwood (adapted and used in Dan Sinykin’s Emory course “Practical Approaches to Data Science with Text” in 2020)
- A notebook for teaching students about BERT.