Investigating the Linguistic DNA of life, body, and soul

The Linguistic DNA project and the OED have been working together since 2018 to apply cutting-edge computational linguistic tools to the writing, editing, and revising of dictionary entries. The aim of the Linguistic DNA project is to investigate linguistic meaning in large text collections, and we have been testing how the project’s tools and outputs might be used to inform the ways that senses and sub-senses of words are divided and ordered in the OED.

When we study word meaning, or lexical semantics,we often systematically investigate words that occur together – or lexical co-occurrence. In fact, lexical co-occurrence is also a key facet of studying meaning not just for individual words, but also for passages of text. The meanings of co-occurring words interact with each other to contribute to the meaning of discursive passages like paragraphs. 

Since 2015, the Linguistic DNA project has been analysing a very specific kind of lexical co-occurrence, which isn’t generally studied in linguistics. Instead of simply studying co-occurring word pairs (like, for example, ham and cheese), we’ve been studying co-occurring trios (like, for example, ham, cheese, and picnic; or ham, cheese, and farming). That third word can add a lot.

In addition, instead of studying these lexical co-occurrence ‘trios’ within a given phrase or sentence, we’ve been analysing trios across large spans of up to 100 words, roughly equivalent to a paragraph.

Instead of deciding in advance which words we’re interested in, we analyse every content word (that is, we exclude grammatical words like prepositions and conjunctions) in very large text collections, identifying billions of trios and ranking them via their statistical measures.

Our primary data set has been the Text Creation Partnership edition of Early English Books Online (EEBO-TCP), containing nearly 60,000 texts and over 1.2 billion words of printed English from the sixteenth and seventeenth centuries.

OED lexicographers are using this data to analyse individual words, looking at all ranked trios that include a given word while writing or revising its dictionary entry, and testing whether trio evidence can help identify sense divisions or re-order sub-senses.

What are some common trios in EEBO-TCP? Some of the most common ones are:

  • life-body-soul
  • world-heaven-earth
  • power-heaven-earth
  • life-death-soul
  • child-son-father

What do these trios look like in practice? Here’s an example of life-body-soul from John Bate’s 1625 The Psalm of Mercy:

Thy Spirit, O Lord, is the life of my soule, as my spirit is the life of my body
If my spirit faile, my body perisheth
If thy Spirit desert my soule, my soule can not but fall irrecoverably.

It might be that life, body, and soul can be easily understood here – or perhaps not. It’s not unusual for soul and body to be presented in parallel like this. What is a soul? It’s difficult to understand. But bodies are much easier to understand, so the author here employs body as a metaphorical tool for understanding soul.Indeed, the author is working to explain the world by defining these blurry notions. Both the body and soul have life. What is life?The life of body and soul is indicated by a fourth word, related to the others and absolutely essential here, spirit. Each of the co-occurring words provides some information about how to interpret the others, and they build into a discourse, which we can see as not just a passage of text, but also the network of meaning relations that reflect and define a perspective on the world.

Here’s an example from John Trevisa in 1515:

And like as he had done to be taken from him his natural life, therefore he should do beside four tapers to burn perpetually about his body that for the extinction of his bodily life his soul may ever be remembered and live in heaven in spiritual life.

Unlike Bate, above, Paynell isn’t at pains to explain the world by defining terms. Instead, he’s describing a funeral – a context when body, life, and soul are pushed to the front of our attention.Here as above, body and soul are set in parallel with each other – and each has a life. Spirit (or, this time, spiritual)is again essential. This time, spiritual is set in contrast to natural, and all of these words contribute to the meaning networks of the passage.

But life-body-soul is far more complex.

What thing is there more unlike than is the body and the soul
And yet with how straight an amity hath nature bound these two together
Certainly the separation of them declareth it, therefore as life is nothing else
But the society of the body and the soul, so the health of all the qualities of the body is concord.

Here, in Paynell’s 1559 translation of Erasmus, body and soul are as unlike as can be, and it is life that somehow binds the two opposites together. Again, nature is a crucial element, but now also amity and society between body and soul, which reflects not only life, but also health.

If Paynell’s text differs in important (if subtle) ways from the previous two, then the following text by Edward Phillips in 1699 is even further removed:

If the Gentlewoman will take the pains to nurse him, his body may perhaps return again to his soul, otherwise he dies like a Silkworm, having spun out himself to pleasure others. To his Mistriss: O Thou the dear inflamer of my eyes, life of my soul, and hearts eternal prize! How delectable is thy love, how pure, how apt to vanish, able to allure a frozen soul.

Here, it is not God’s spirit but a lover that is the life of the soul. And it is not the soul that leaves the body in death, as in Paynell’s text, but the body that returns to the soul through romantic love. This is a significantly altered network of meanings, reflecting a very different – and more secular – worldview.

These are just four of the many thousands of instances of life-body-soul in EEBO-TCP. Each offers a fascinating way into a text collection that is far too large to read in its entirety. And each trio is a richer way into this very large text data than a simple search for a single word.

One characteristic of trios that occur extremely frequently and exhibit a high statistical strength of co-occurrence is a high degree of vagueness and polysemy. Words like life, soul, and body are fuzzy in their boundaries and underspecified in their attributes – and they carry a wide range of meanings.As a result, they demand a great deal from readers, in the early modern period as well as today. They require active interpretation, and discrimination of meaning through context – including an understanding of the network of co-occurring words, as well as broader knowledge about society and culture. A trio like life-body-soul can indicate a wide range of discourses, which differ from one author to the next, and from one decade to the next. Right now, the Linguistic DNA team is working to address whether the trio life-body-soul, and others like it, might indicate or embody processes of secularisation through the 16th and 17th centuries, and whether it is their very volatility – the high degree of vagueness and polysemy, the active work required to discern their meaning, and the many interpretations that are possible for a single trio – that renders them so indispensable to discourse.

Image: Vanitas Still Life, Pieter Symonsz. Potter, 1646

