skip to content

Every writing technology has advantages and drawbacks. The papyrus codex, for example, was (for its time) user-friendly and easy to search, because its architecture was rather similar to that of the world wide web — composed of a large number of short pages, rather than a continuous scroll.

It did, on the other hand, have drawbacks for copying, which was very labour-intensive, and also for permanent archiving, due to its fragility, which may be seen from this illustration of Charles Hedrick working on the Nag Hammadi codices.


Similar factors still apply in the modern world. Information flow is optimised when the message doesn't depend entirely on the medium, but can be translated across a variety of vehicles. For this reason, we are composing the Cambridge Greek Lexicon using XML technology (the letters stand for 'extensible markup language'). This means that our pages are not just formatted for appearance, as with word-processing software, but are also typeset for publication, and configured for online display and searching in electronic editions.

Here is a comparison of the two systems. First, a page of the lexicon composed using a word-processing program looks something like this:


This electronic typing standardises and preserves formatting quite well. However, extensive proof-reading has to be undertaken, and, in order to be translated into other media, the structure underlying the formatting also needs to be recorded. For example, the plain-text passages sometimes express the definition of the headword, but when bracketted, they may express an introductory or following explanatory remark, or encyclopaedic information, and these need to be identified if the lexicon is to be searched.

Such information can be preserved using XML 'tagging' (a development of the HTML which is used to format WWW pages). The basis of the system is an extended use of tags (labels inside pointed brackets) which in HTML are used to mark format: instead of typing a bold section of text by changing the font style, the passage is simply enclosed within "<bold>" tags.

In XML, the tags can define structure as well as format, and we can configure our own tags, so we can mark the headword or lemma by enclosing it in a specific tag. We can stipulate that this tag always marks the headword (a structural function), and that the text inside it is always in a bold Greek font (a formatting requirement). Similarly, we can tag the inflection, dialect forms, principal parts, definitions, and contextual information.

We have found that 100 different tags suffice to cover every type of entry in the lexicon. Here is the start of the page shown above, now marked up in XML: 


At first glance, this may look rather forbidding, but it soon becomes as natural as setting styles in a 'Word' document. We select the tags as we write. For example, the first entry, for libazomai, is enclosed in tags marked 'VE', because it is a 'verb entry'. Within that 'wrapper', there is a 'verbal head group' (vHG), which contains the lemma (HL), the part of speech label (PS), and the etymology (Ety). These elements may contain others within them, in a hierachical structure. And some of the elements are primarily there to facilitate searching: for example, inside the etymology tag, the related word libas is enclosed in 'Ref' tags, which indicate that it refers to another headword in the lexicon.

The definitions and translations appear inside 'S1' tags, and there are also 'S2' tags (not shown here) for subsections illustrating nuances of meaning. Within these 'S' elements are many others marking authors and contextual information, such as the subjects and objects which a verb takes, examples of nouns qualified by an adjective, or verbs modified by an adverb.

This level of precision means that we can immediately translate the page into print-quality format, producing a PDF page.

This gives us an accurate picture of the finished product, enabling us to identify typing errors and unwanted variations of style and content while we are writing. There are other advantages too: because we have organised the tags within a specific structure, we are encouraged to be consistent in the way we write each entry, and so we can maintain a 'house style'.

The final step is to combine these  individual pages into a single paginated PDF document to produce the final typeset copy.

We may sum up the advantages of XML authoring under five headings:

1: An integrated, flexible writing and publishing environment

We can cope with any technical problems which might arise as we proceed, and produce precise formatting for the typesetters, so the task of proof-reading will be greatly helped.

2: A consistent writing style

Inconsistencies are almost unavoidable in typed copy, and especially when articles are written by more than one person. For example, when citing Euripides Antiope, LSJ refers to "Antiop.iv B, [line number] A" and also "Antiop.iv B line ... Arn." and sometimes "Antiop.iv B line ... Arnim", or else "Antiop.p.21 A" or "Antiop.B 58 p.21 A". All these citations refer to the same fragment (fr.10 in Page's Select Papyri). Consistency could have been maintained if the authors of LSJ had been able to compare all their citations easily.

XML allows us to apply maximal constraints to entries, and so enables all the members of the editorial team to maintain consistent style and format.

3: A structure which reflects our methodology

Our aim has been to create structures which impose constraints on the writing, yet remain flexible enough to contain the range of information which we may wish to enter. We achieve this most importantly through the innovation of using dedicated structures for each part of speech. This enables us to maintain a balance between extended definitions, translation glosses, and contextual and encyclopaedic information, so that we are helped to write the last entries in the same style as we wrote the first.

4: A product which is translatable across publishing media

There will be an electronic edition on the Perseus site. The system means that it can be easily and accurately searched. Dictionaries which are tagged after they were written necessarily contain fewer tagged elements than ours (as they were not composed with such a precise structure), so fewer types of search are possible. A reader of our lexicon can, for example, see how vocabulary changes across the range of Ancient and Koiné Greek, because we mark usage in a corpus of 70 authors, from Homer to Plutarch, and so we can compare word frequency in different writers. And our system will also be linked to other Perseus databases, to images as well as to texts.

5: Better-organised material

XML releases us from the constraints on space of the printed book. Most usefully, our 'annotation' element allows us to incorporate editorial notes in each entry, for reference during the writing, and as a permanent archive of our research.

And cross-reference elements enable us to perform electronic searches during the writing and proof-reading stages. That has a number of advantages:

(a) We shall be able to group related words together, so we can easily compare all words sharing the same stem, and write the entry for a simple form before dealing with its derivatives. It is useful to compare the entries for all the compounds from the verb bainw (go), which can take the preverbs ana-, anti-, apo-, dia-, eis-, ek-, epi-, kata-, exana-, meta-, para-, peri-, poti-, pro-, pros-, sum-, huper-, and hupo-, rather than only treating them in alphabetical order.

(b) We can investigate the range of meanings of the prefixes themselves, across the different primary forms (as in the derivatives of bainw listed above), and compare this with their uses as independent prepositions and adverbs.

(c) We can incorporate cultural information. For example, colour terms constitute a group which is currently the subject of considerable semantic interest, and XML tagging enables us to study them not only in their primary forms, but also in compounds, where they may appear as stems, as in akro-kelainiown, with black surface; dia-melainw, become quite dark; hupo-glaukos, somewhat grey, huperuthros, rather red. They also appear as prefixes combined with a noun like aspis ('shield'): we find leuk-aspis ('white-shielded'), phoinik-aspis ('red-shielded'), chalk-aspis ('bronze-shielded'), and chrus-aspis ('gold-shielded').

Attention to these details of word formation enables the writers to compose more precise definitions, which in turn can help the student gain a deeper understanding of Greek word meaning. Electronic searching during the writing process will help us produce a more consistent, coherent, and consequently more useful lexicon.

The XML environment is a little more challenging for the writers, because we have to become accustomed to manipulating the tags. However, it does save effort, too, as the text formatting is largely automated, we don't need to select bold or italic fonts, or to insert brackets, or even section numbers: all that is done automatically.

And the advantages are that mistakes and inconsistencies can be avoided, the writing and publication processes are integrated, and the usefulness of the lexicon can be maximised and extended in the future, as new ways of integrating verbal and visual information are discovered. We believe that this will help students to explore the richness of the Ancient Greek vocabulary in the most effective way possible.


Next Page: Research Partnerships

Latest news

VIEWS project Visiting Fellowships

20 May 2024

We invite applications for two funded VIEWS project Visiting Fellowships, with a deadline of 30th June 2024. For further details please follow this link.

Dr Richard Duncan-Jones FBA 1937-2024

19 May 2024

The Faculty is saddened by news of the death of Dr Richard Duncan-Jones FBA FSA. He had been a Fellow of Gonville and Caius College since 1963 where he was a college lecture in Classics and Director of Studies for many years.

Language Teaching Associate

17 May 2024

The Faculty of Classics is seeking to appoint a Fixed Term Teaching Associate from 01 September 2024 until 31 August 2026 (0.6 FTE). The teaching will principally involve intensive reading classes in Greek and Latin for students without A level qualification or equivalent at entry. For more details see here. CLOSING DATE...

New appointment in Latin literature

15 May 2024

The Faculty is delighted to announce the appointment of Dr Elena Giusti as a new Assistant Professor of Latin literature. She will join the Faculty in the new academic year. Elena will be joining from the University of Warwick, where she is currently Associate Professor of Latin . She works broadly on Roman literature and...