Case Study: Text Annotation Made Easy

Case Study: Text Annotation Made Easy

The OED Labs initiative is helping to shape the future of the Oxford English Dictionary in its evolution as a vital tool in the digital humanities. Our aim is to offer researchers new, more direct, and more flexible ways to access the massive curated dataset the OED offers, and to gain richer insights in the English language than ever before.

In this case study Dr Claudia Roberta Combei (University of Bologna) shares her experience as an early adopter of the OED Text Annotator tool.


How did you apply the OED Text Annotator tool to your research?

As a corpus (and applied) linguist, I am always on the prowl for new techniques and resources to improve the effectiveness of my research.

At the end of March 2021, I read about the Oxford English Dictionary (OED) Text Annotator prototype on Linguist List and I was curious to know more about it. I joined the OED Researchers Advisory Group and then I applied to become an early adopter of this experimental tool.

I have had the chance to test the OED Text Annotator on a corpus of inaugural addresses given by several US Presidents from 1789 to 2009. The case study adopts a diachronic approach and it aims at measuring lexical variation and change in political discourse. All the functions available through the OED Text Annotator have been useful for this work.

What was your experience of using the OED Text Annotator tool?

One of the things I like most about the OED Text Annotator is its ease of use. In fact, no coding skills are required; this makes the tool suitable for a large pool of potential users (e.g., linguists, lexicographers, terminologists, digital humanists, language teachers, students, etc.).

The intuitive interface of the OED Text Annotator takes as input any kind of post-1750 text written in English. The processing time is reasonable even with large corpora and the output is the annotated version of the corpus in a .csv format (appropriate for further processing in R, for example).

Not only does the output include state-of-art corpus lemmatization and POS-tagging, but it also provides rich etymological annotation (e.g., first known use, the language of immediate origin, etc.) and frequency information (e.g., in text frequency vs. modern/contemporary frequency) for each token in the corpus.

For instance, the etymological annotation of Thomas Jefferson’s 1801 and 1805 inaugural addresses reveals that words such as “demoralizing”, “domiciliary”, “implement”, “infuriated”, and “mercantile” were actually considered neologisms at that time, as they were first attested at the end of the 18th century and at the beginning of the 19th century.


Dr Combei presented her research, and discussed her experience with the OED Text Annotator in more detail at our webinar:

Webinar: The Oxford English Dictionary Text Annotator prototype tool

The OED Text Annotator, an OED experimental research tool, will allow users to input a digital version of a chosen text for analysis. The user will receive back a version in which each lexical token has been annotated with OED information, including etymology and date range. The version which is being developed now is optimized for post-1750 text, but if our users are excited by the tool, we plan to develop the product to be functional across all time periods of the English language.

Emily Hoyland, Product Manager, Tania Styles, Revision Editor, and James McCracken, Language Engineering Manager, demonstrate the tool and its capabilities in this short online talk, and explain how you can test it yourself.

Our guest speaker, Dr Claudia Roberta Combei, NLP Manager at the Università di Bologna, Italy, details how she has been using the Text Annotator in her research mapping language variation and change in political discourse, focusing on inaugural addresses by US Presidents from 1789 to 2009.

The opinions and other information contained in the OED blog posts and comments do not necessarily reflect the opinions or positions of Oxford University Press.

Comments