Digitizing the OED: the making of the Second Edition

Over the course of our anniversary celebrations for the OED’s First Edition, we have explored a number of milestones in the evolution of the English language and the dictionary itself. One such milestone was the publication of a new edition of the OED, some 61 years after the First Edition’s final facsimile went to print. In this blog post, the Second Edition’s co-Chief Editor Edmund Weiner shares the remarkable story of how ‘the supreme authority’ on the English language was transformed for a new audience.

30 March 2019 will be the thirtieth anniversary of the publication of the Second Edition of the OED. It was an unprecedented dictionary project: mooted in 1982, launched at the start of 1984, and completed according to all specifications, on schedule, and within budget, in March 1989.

At first it was envisaged as ‘the biggest scissors and paste job in history’: the 12-volume first edition of the OED (published between 1884 and 1928) needed to be merged into the four-volume OED Supplement (1972-86, containing words and meanings added since the first edition) as a way of protecting OUP’s copyright in the Dictionary. But we had long recognized that the OED as a whole needed updating: the wording of definitions, the information about etymology and pronunciation, the quotation evidence, all needed extensive revision. We quickly realised that, if the two texts were first digitized, both the merger and the subsequent revision could be done far more efficiently, and moreover it would result in an electronic version of the Dictionary which could be made accessible for look-up and research in completely new ways. So right from the start, the two foundation principles of the OED project were established: updating the entire text and making it accessible as widely as possible.

We also very early appreciated the size of the task of revising and updating the OED, so we split the project into two phases: the first being the publication, in book form and electronically, of the integrated first edition and Supplement as the Second Edition. Phase 2, the current revision programme (OED3) began in 1993.

OED Second Edition mark up ladder

Very little of what we were aiming to do had ever been done before, least of all by OUP, and we suddenly found ourselves at the cutting edge of information technology. Looking for help with the necessary computer processing, we did the rounds of the big computer companies who at that time dominated the digital scene, with disappointing results. But unexpectedly the research wing of IBM UK saw the potential of the project and stunned us by deciding to donate a mainframe computer and the services of several software engineers. Digitization was another huge challenge; optical character recognition wouldn’t work, and it was by another chance that we happened upon an American company, a subsidiary of Reed International, who had the right expertise and a very large workforce of keyboarders to undertake the data capture. A third lucky break led us to the University of Waterloo in Canada, who were developing software for searching very large bodies of data; in exchange for access to the OED data they promised to let us use whatever software tools they should develop.

Another coincidence was that ‘Generalized Markup Language’ (ancestor of SGML and XML) had just been invented. After swift initiation into the way textual markup worked, I went off with a set of my children’s crayons to colour in the different categories of information represented in an OED entry (above, right); from this we developed a markup schema.

Keying started in 1984. A new problem arose: how to elaborate the markup tags to make the dictionary text more susceptible to intelligent manipulation. We didn’t want just ‘italic’, we wanted ‘label’ or ‘linguistic form’. Providentially, Waterloo produced a suitable parser that was developed onsite at OUP. A computer system was rapidly built. Data from OED1 and the Supplement was fed in and automatically merged. This didn’t always go according to plan: the Supplement’s entry LIBBER (‘women’s libber’) should have been labelled with a superscript 2, but since it wasn’t it got merged with OED1’s LIBBER (‘sow castrator’) with amusing results in the quotation department.

Extensive editing was needed, but no suitable software existed; IBM, however, happened to have an editor under development that could be adapted (we called it ‘Oedipus’). Computer storage was so limited that we could work only on a one-fortieth-size chunk of the Dictionary at any time; we couldn’t search the text, but had to request a search and work from results printed out on paper. At any given time, half a dozen different stages of editing, correcting, and proof-reading would be going on simultaneously in different chunks of the text. To keep track, my 9-year old son and I made several sets of cardboard rectangles each in a different colour representing each process, and as a process was finished for each chunk of text these were stuck on to a painted background, gradually building a pictorial wall (above, left).

The publication of the twenty printed volumes of OED2 in 1989 was very exciting, but in many ways the appearance of the CD-ROM version in 1993 was the real culmination of the project: people around the world were amazed at the power they now had, on a disk the size of a beermat, to search the entire contents of the OED.

Header image: (from left) Edmund Weiner, Penny Silva, and John Simpson. OED Archives, copyright Norman McBeath

