The OED, the HT, and the HTOED – Part II: Revisions and Updates

In the first of this series of blog posts, I described the process by which the Historical Thesaurus of English was created from dictionary data, especially the Oxford English Dictionary. Today, I’ll look at some of the ways in which the data has been updated to keep it relevant as time marches unavoidably on. It is important to say at the outset that I can speak only for the Glasgow Historical Thesaurus (HT) team, not the excellent OED staff working on the version of the Thesaurus integrated into the OED site, although I will discuss the connections between these resources.

Updates to the HT and the OED Historical Thesaurus come in three main forms – updates to word data (such as attestation dates), the addition of new words, and updates to the categorisation. As with the original print HTOED (corresponding to HT version 1), word data and the words themselves are incorporated into the OED and HT as a result of research for the 3rd edition of the OED (OED3). Data is now shared on a regular basis between the HT and OED, and both resources are constantly under revision. What, then, are some of the ways in which the HT and the OED Historical Thesaurus have been updated since their publication?

Word updates are most readily apparent, and the data for this comes directly from work on OED3 for whose support and collegiality the HT team are ever grateful. The revision of the dictionary makes the data which can be ingested into the thesaurus ever more accurate. Not only are newer words added, but previous gaps in coverage are filled (thus allowing the HT to correct the lamentable absence of Bigfoot in the category of miscellaneous hairy, human-like mythical creatures!). Additionally, word attestation dates can be reliably updated, including verification of the continued currency of many words thanks to OED3 work. New words are also added to the OED Historical Thesaurus by its team, especially where appropriate categories exist, and the HT team are working hard to reflect these updates in our own ongoing revision of the HT 2nd edition (a painstaking process which is a story for another time, if ever!)

Before discussing categorisation updates, it is worth noting that a couple of disparities between the HT and the OED are the result of editorial policy decisions rather than revision. Most notably, there was editorial disagreement over the appropriate place for The Universe in the thesaurus. HT categorisation is based on an explicit folk taxonomy due to its long historical reach; the Universe as distinct from the World emerges relatively recently in history, and so the HT editors placed it underneath the World category. The OED editors decided in this instance to deviate from this taxonomy and move the universe to be superordinate to the world, reflecting a more scientific worldview. As a result, the mirrored hierarchies of the Historical Thesaurus and the OED depart slightly from one another here. There are also legitimate differences in subcategories within main sequence categories; some HT categories contain only Old English words which did not survive into later periods of the language and are therefore not within the purview of the OED. These subcategories are, consequently, not represented in the OED hierarchy.

Category revision of the HT has been shaped by its use as a research tool at the University of Glasgow where, upon completion of the thesaurus, projects began to launch which explored the published data (more on these in my final blog post!). Most notable of these revisions was the adjustment of part of the hierarchy in the first of the three major branches of the HT, The World, which deals with concepts connected to the natural world. This is the largest of the three top-level categories (the other two are The Mind and Society) and was also surprisingly bottom-heavy – that is to say, there were more categories and words in the lower levels of the hierarchy than would be expected. For better balance the Glasgow team revised the structure to raise the positioning of a large section of The World so that major concepts like Food and Drink are more accessible to a user. This increased the number of ‘level 2’ categories from twenty-six to thirty-seven.

Similarly, amongst the updates to the HT 2nd edition was a hierarchy revision of category Life to become Life and Death. Death was always an incongruous daughter category of Life, but it was not feasible to move it up to be a sister category of Life as it was too small a branch of the hierarchy tree to warrant such a high position. Instead, the insertion of Life and Death acknowledges the presence of both these connected concepts. This necessitated movement of the previous Life contents into a ‘new’ Life category subordinate to Life and Death but at the same level as Source/principle of life and its siblings.

Movements of this type are rare but do happen. In some areas of the hierarchy wide-spread rearrangement of categories and their contents is necessary, especially in the category of Computing whose vocabulary has expanded massively since publication of OED2. This project is likely to be highly demanding of staff time, however, which means that as yet it has not been accomplished, although the HT and OED teams continue to consider how to make this possible. The addition and expansion of new concepts such as those for new technologies is one of the exciting ways in which the semantic inventory of English continues to grow and change.

Now that we’ve looked at the creation and updating of the Historical Thesaurus, my third and final post will consider how it has been used in academic research and the projects which have explored its data and the possibilities for utilising its unique hierarchy.

