![API model](img/api_model.png)




------------------------------------------

# The OED researcher API

The OED API is an interface that enables clients to do things with information derived from the OED.
* 'Clients' = primarily programs and applications, rather than people.

--------------------------------

### Usage

Documentation and sign-up:
https://languages.oup.com/research/oed-researcher-api/

Base URL: https://oed-researcher-api.oxfordlanguages.com/oed/api/v0.2/


---------------------------------

### Sample queries

Entry or entries for the word _monitor_: 
https://oed-researcher-api.oxfordlanguages.com/oed/api/v0.2/words/?lemma=monitor

Senses of the word _monitor_ that existed in 1700: 
https://oed-researcher-api.oxfordlanguages.com/oed/api/v0.2/senses/?lemma=monitor¤t_in=1700

Words formed with the suffix _–esque_: 
https://oed-researcher-api.oxfordlanguages.com/oed/api/v0.2/word/esque_su01/derivatives/

Senses to do with tennis: 
https://oed-researcher-api.oxfordlanguages.com/oed/api/v0.2/senses/?topic=Tennis

Quotations by women authors between 1780 and 1800: 
https://oed-researcher-api.oxfordlanguages.com/oed/api/v0.2/quotations/?year=1780-1800&author_gender=female

… and the same where these provide the earliest evidence for a word: 
https://oed-researcher-api.oxfordlanguages.com/oed/api/v0.2/quotations/?year=1780-1800&author_gender=female&first_in_word=true

Words derived from Hungarian: 
https://oed-researcher-api.oxfordlanguages.com/oed/api/v0.2/words/?etymon_language=Hungarian

----------------------------------

# Basic API usage

### Imports and constants

In [None]:
import re
import json
import pprint
import requests


API_BASE_URL = 'https://oed-researcher-api.oxfordlanguages.com/oed/api/v0.2/'
# Parameters to be included as headers to each API request
# - required for authorization.
with open('credentials.json') as f:
 credentials = json.load(f)
HEADERS = {
 'app_id': credentials.get('APP_ID'),
 'app_key': credentials.get('APP_KEY'),
}


def _make_api_request(endpoint, query_params, show_url=False):
 """
 Make the API request
 
 Parameters
 ----------
 endpoint : str
 The API endpoint, e.g. 'senses'.

 query_params: dict
 Additional query parameters to include in the request.

 show_url : bool, optional
 Defaults to False.

 Returns
 -------
 list
 A list of dicts, each dict being the JSON representation
 of a word, sense, etc., as returned by the API.
 """
 response = requests.get(
 API_BASE_URL + endpoint + '/',
 params=query_params,
 headers=HEADERS,
 )
 if show_url:
 print(response.url + '\n')
 if str(response.status_code) != '200':
 _error_report(response)
 exit()
 else:
 return response.json()['data']

 
def _error_report(response):
 """
 Print out an error report for any response that does
 not have a 200 status code.

 Parameters
 ----------
 response : requests.Response object
 """
 print('! Status code {code} returned by URL {url}'.format(
 code=response.status_code,
 url=response.url,
 ))

### List senses for a lemma
Using the OED API _/senses/_ endpoint.

* retrieves all the senses of a lemma;
* (optionally) filters for the subset of senses current in a given period (include a year=yyyy keyword argument);
* returns senses in date order.

In [None]:
def list_possible_meanings(formatted=True):
 lemma = input('WORD: ')
 year = input('YEAR: ')
 display_senses(
 lemma.strip(),
 year=int(year.strip()),
 formatted=formatted,
 )

 
def display_senses(lemma, year=None, formatted=True):
 """
 Parameters
 ----------
 lemma : str
 The lemma (word) for which senses are sought.

 year : int, optional
 If specified, results are filtered so that only
 senses that were current in this year are included.
 Defaults to None.

 formatted : bool, optional
 If True, display a formatted version of the sense.
 If False, display the raw JSON representation of
 the sense. Defaults to True
 """
 # Set the parameters for the API request
 query_params = {'lemma': lemma, 'current_in': year}
 senses = _make_api_request('senses', query_params, show_url=True)
 for sense in senses:
 if formatted:
 _display_formatted(sense)
 else:
 _display_raw(sense)


def _display_raw(sense):
 """
 Display the raw JSON of a sense returned by the API.
 
 Parameters
 ----------
 sense : dict
 JSON representation of an OED sense, as returned by the API.
 """
 pprint.pprint(sense, indent=2, width=80, compact=False, sort_dicts=False)
 print('')


def _display_formatted(sense):
 """
 Display a formatted view of selected features of a sense
 returned by the API.
 
 Parameters
 ----------
 sense : dict
 JSON representation of an OED sense, as returned by the API.
 """
 print('{pos}: {defn}\n\t{date}\n\t{ref} {url}\n'.format(
 pos=sense['part_of_speech'],
 defn=sense['definition'],
 date=sense['daterange']['rangestring'],
 ref=sense['oed_reference'],
 url=sense['oed_url'],
 ))


list_possible_meanings(formatted=True)

------------------------------

# Parsing a piece of text

### From John Marston's satire _The scourge of villanie_ (1598)
> But I am vexed, when swarmes of _Iulians_ 
> Are still manur'd by lewd Precisians: 
> Who scorning Church rites, take the simbole vp 
> As slouenly, as carelesse Courtiers slup 
> Their mutton gruell. Fie, who can with-hold, 
> But must of force make his milde Muse a scold?

------------------------------

### Processing each token in a sentence
Using the OED API _/lemmatizetext/_ endpoint:
* tokenizes the input sentence;
* skips punctuation and core vocabulary tokens;
* identifies possible lemmatizations for non-core vocabulary.

Candidate lemmatizations are returned in order of likelihood, taking into account:
* the date of the text;
* some basic part-of-speech tagging. (This can be improved by pre-processing the text.)


In [None]:
TEXT = """
But I am vexed, when swarmes of Iulians
Are still manur'd by lewd Precisians:
Who scorning Church rites, take the simbole vp
As slouenly, as carelesse Courtiers slup
Their mutton gruell.
"""


def parse_text(text, year):
 """
 Use the OED API to parse a string of text.

 Parameters
 ----------
 text : str
 The string of text to be parsed.

 year : int
 The (approximate) date of the text.
 """
 text = text.replace('\n', ' ').strip()
 # Set the parameters for the API request
 query_params = {'text': text, 'year': year}
 tokens = _make_api_request('lemmatizetext', query_params)
 for token in tokens:
 process_token(token, year)


def process_token(token, year):
 """
 Print out information for a single token.

 Parameters
 ----------
 token : dict
 The dict of features for a single token
 (see documentation for the OED API /lemmatizetext/
 endpoint).

 year : int
 The (approximate) date of the source text.
 """
 print(' ' + token['token'])
 for entry in (t['word'] for t in token['lemmatizations']):
 print(' {e} ({date})'.format(
 e=entry['oed_reference'],
 date=entry['daterange']['rangestring'],
 ))
 break


parse_text(TEXT, 1598)

------------------------------

### Meanings
Using the OED API _/word/{id}/senses/_ endpoint.

For a given word, this:
* retrieves all the senses of the word, as listed in OED;
* (optionally) filters for the subset of senses current in a given period;
* returns senses in date order.

For simplicity:
* we assume that the first lemmatization candidate is correct - so we're only retrieving senses for this word;
* we're skipping higher-frequency words - we're only interested in senses for lower-frequency words.

In [None]:
def process_token(token, year):
 """
 Print out information for a single token.

 Parameters
 ----------
 token : dict
 The dict of features for a single token
 (see documentation for the OED API /lemmatizetext/
 endpoint).

 year : int
 The (approximate) date of the source text.
 """
 print(' ' + token['token'])
 if token['lemmatizations']:
 entry = token['lemmatizations'][0]['word']
 print(' {e} ({date})'.format(
 e=entry['oed_reference'],
 date=entry['daterange']['rangestring'],
 ))
 fetch_senses(entry, year)


def fetch_senses(entry, year):
 """
 Fetch and print out the set of senses belonging to a given word,
 filtered for the subset of senses that were current in the year
 specified.

 Parameters
 ----------
 entry : dict
 The entry whose senses are sought.

 year : int
 The year used to filter senses for currency.
 """
 # Bail out if this is a high-frequency word
 if entry['frequency'] and entry['frequency'][-1][1] > 2:
 return

 query_params = {'current_in': year}
 endpoint = 'word/{id}/senses'.format(id=entry['id'])
 senses = _make_api_request(endpoint, query_params)
 for sense in senses[0:3]: # just the first 3 senses
 print(' \u2043 "{defn}..." ({date})'.format(
 defn=sense['definition'][0:80],
 date=sense['daterange']['rangestring'],
 ))
 fetch_synonyms(sense['id'], year)


def fetch_synonyms(sense_id, year):
 pass # stub


parse_text(TEXT, 1598)

------------------------------

### Synonyms
Using the OED API _/sense/{id}/synonyms/_ endpoint.

For a given sense, this:
* retrieves all senses in the same node of the semantic taxonomy (~synonyms);
* (optionally) filters for the subset of synonyms current in a given period;
* returns synonyms in alphabetical order by lemma. (Here we post-process the API response to re-sort into date order.)

In [None]:
def fetch_synonyms(sense_id, year):
 """
 Fetch and print out the set of synonyms for a given sense,
 filtered for the subset of synonyms that were current in
 the year specified.

 Parameters
 ----------
 sense_id : str
 The ID of the sense whose synonyms are sought.

 year : int
 The year used to filter synonyms for currency.
 """
 query_params = {
 'current_in': year,
 }
 endpoint = 'sense/{id}/synonyms'.format(id=sense_id)
 synonyms = _make_api_request(endpoint, query_params)
 # Re-sort synonyms into date order
 synonyms.sort(key=lambda s: s['daterange']['start'])
 for synonym in synonyms:
 if synonym['id'] == sense_id:
 continue
 print(' \u2023 {lemma} ({date})'.format(
 lemma=synonym['lemma'],
 date=synonym['daterange']['rangestring'],
 ))


parse_text(TEXT, 1598)