skip to content

Every writing technology has advantages and drawbacks. The papyrus codex, for example, was (for its time) user-friendly and easy to search, because its architecture was rather similar to that of the world wide web — composed of a large number of short pages, rather than a continuous scroll.

It did, on the other hand, have drawbacks for copying, which was very labour-intensive, and also for permanent archiving, due to its fragility, which may be seen from this illustration of Charles Hedrick working on the Nag Hammadi codices.


Similar factors still apply in the modern world. Information flow is optimised when the message doesn't depend entirely on the medium, but can be translated across a variety of vehicles. For this reason, we are composing the Cambridge Greek Lexicon using XML technology (the letters stand for 'extensible markup language'). This means that our pages are not just formatted for appearance, as with word-processing software, but are also typeset for publication, and configured for online display and searching in electronic editions.

Here is a comparison of the two systems. First, a page of the lexicon composed using a word-processing program looks something like this:


This electronic typing standardises and preserves formatting quite well. However, extensive proof-reading has to be undertaken, and, in order to be translated into other media, the structure underlying the formatting also needs to be recorded. For example, the plain-text passages sometimes express the definition of the headword, but when bracketted, they may express an introductory or following explanatory remark, or encyclopaedic information, and these need to be identified if the lexicon is to be searched.

Such information can be preserved using XML 'tagging' (a development of the HTML which is used to format WWW pages). The basis of the system is an extended use of tags (labels inside pointed brackets) which in HTML are used to mark format: instead of typing a bold section of text by changing the font style, the passage is simply enclosed within "<bold>" tags.

In XML, the tags can define structure as well as format, and we can configure our own tags, so we can mark the headword or lemma by enclosing it in a specific tag. We can stipulate that this tag always marks the headword (a structural function), and that the text inside it is always in a bold Greek font (a formatting requirement). Similarly, we can tag the inflection, dialect forms, principal parts, definitions, and contextual information.

We have found that 100 different tags suffice to cover every type of entry in the lexicon. Here is the start of the page shown above, now marked up in XML: 


At first glance, this may look rather forbidding, but it soon becomes as natural as setting styles in a 'Word' document. We select the tags as we write. For example, the first entry, for libazomai, is enclosed in tags marked 'VE', because it is a 'verb entry'. Within that 'wrapper', there is a 'verbal head group' (vHG), which contains the lemma (HL), the part of speech label (PS), and the etymology (Ety). These elements may contain others within them, in a hierachical structure. And some of the elements are primarily there to facilitate searching: for example, inside the etymology tag, the related word libas is enclosed in 'Ref' tags, which indicate that it refers to another headword in the lexicon.

The definitions and translations appear inside 'S1' tags, and there are also 'S2' tags (not shown here) for subsections illustrating nuances of meaning. Within these 'S' elements are many others marking authors and contextual information, such as the subjects and objects which a verb takes, examples of nouns qualified by an adjective, or verbs modified by an adverb.

This level of precision means that we can immediately translate the page into print-quality format, producing a PDF page.

This gives us an accurate picture of the finished product, enabling us to identify typing errors and unwanted variations of style and content while we are writing. There are other advantages too: because we have organised the tags within a specific structure, we are encouraged to be consistent in the way we write each entry, and so we can maintain a 'house style'.

The final step is to combine these  individual pages into a single paginated PDF document to produce the final typeset copy.

We may sum up the advantages of XML authoring under five headings:

1: An integrated, flexible writing and publishing environment

We can cope with any technical problems which might arise as we proceed, and produce precise formatting for the typesetters, so the task of proof-reading will be greatly helped.

2: A consistent writing style

Inconsistencies are almost unavoidable in typed copy, and especially when articles are written by more than one person. For example, when citing Euripides Antiope, LSJ refers to "Antiop.iv B, [line number] A" and also "Antiop.iv B line ... Arn." and sometimes "Antiop.iv B line ... Arnim", or else "Antiop.p.21 A" or "Antiop.B 58 p.21 A". All these citations refer to the same fragment (fr.10 in Page's Select Papyri). Consistency could have been maintained if the authors of LSJ had been able to compare all their citations easily.

XML allows us to apply maximal constraints to entries, and so enables all the members of the editorial team to maintain consistent style and format.

3: A structure which reflects our methodology

Our aim has been to create structures which impose constraints on the writing, yet remain flexible enough to contain the range of information which we may wish to enter. We achieve this most importantly through the innovation of using dedicated structures for each part of speech. This enables us to maintain a balance between extended definitions, translation glosses, and contextual and encyclopaedic information, so that we are helped to write the last entries in the same style as we wrote the first.

4: A product which is translatable across publishing media

There will be an electronic edition on the Perseus site. The system means that it can be easily and accurately searched. Dictionaries which are tagged after they were written necessarily contain fewer tagged elements than ours (as they were not composed with such a precise structure), so fewer types of search are possible. A reader of our lexicon can, for example, see how vocabulary changes across the range of Ancient and Koiné Greek, because we mark usage in a corpus of 70 authors, from Homer to Plutarch, and so we can compare word frequency in different writers. And our system will also be linked to other Perseus databases, to images as well as to texts.

5: Better-organised material

XML releases us from the constraints on space of the printed book. Most usefully, our 'annotation' element allows us to incorporate editorial notes in each entry, for reference during the writing, and as a permanent archive of our research.

And cross-reference elements enable us to perform electronic searches during the writing and proof-reading stages. That has a number of advantages:

(a) We shall be able to group related words together, so we can easily compare all words sharing the same stem, and write the entry for a simple form before dealing with its derivatives. It is useful to compare the entries for all the compounds from the verb bainw (go), which can take the preverbs ana-, anti-, apo-, dia-, eis-, ek-, epi-, kata-, exana-, meta-, para-, peri-, poti-, pro-, pros-, sum-, huper-, and hupo-, rather than only treating them in alphabetical order.

(b) We can investigate the range of meanings of the prefixes themselves, across the different primary forms (as in the derivatives of bainw listed above), and compare this with their uses as independent prepositions and adverbs.

(c) We can incorporate cultural information. For example, colour terms constitute a group which is currently the subject of considerable semantic interest, and XML tagging enables us to study them not only in their primary forms, but also in compounds, where they may appear as stems, as in akro-kelainiown, with black surface; dia-melainw, become quite dark; hupo-glaukos, somewhat grey, huperuthros, rather red. They also appear as prefixes combined with a noun like aspis ('shield'): we find leuk-aspis ('white-shielded'), phoinik-aspis ('red-shielded'), chalk-aspis ('bronze-shielded'), and chrus-aspis ('gold-shielded').

Attention to these details of word formation enables the writers to compose more precise definitions, which in turn can help the student gain a deeper understanding of Greek word meaning. Electronic searching during the writing process will help us produce a more consistent, coherent, and consequently more useful lexicon.

The XML environment is a little more challenging for the writers, because we have to become accustomed to manipulating the tags. However, it does save effort, too, as the text formatting is largely automated, we don't need to select bold or italic fonts, or to insert brackets, or even section numbers: all that is done automatically.

And the advantages are that mistakes and inconsistencies can be avoided, the writing and publication processes are integrated, and the usefulness of the lexicon can be maximised and extended in the future, as new ways of integrating verbal and visual information are discovered. We believe that this will help students to explore the richness of the Ancient Greek vocabulary in the most effective way possible.


Next Page: Research Partnerships

Latest news

VIEWS PhD Studentship

4 April 2023

The Faculty of Classics is recruiting for a PhD student to join the Visual Interactions in Early Writing Systems (VIEWS) project in October 2023. The student will work on a predetermined topic, namely visual aspects of the linear scripts of the Bronze Age Aegean (Cretan Hieroglyphic, Linear A and Linear B), although there...

Classics Shorts with Mary Beard: videos for schools

19 February 2023

We are thrilled to be launching Classics Shorts : a series of videos for schools introducing the ancient Greek and Roman worlds and exploring themes with continuing resonance for the modern classroom. Each film is accompanied by teaching materials for use in schools. Celebrity guests join Mary Beard and her colleagues to...

New appointment in Classical Archaeology

10 February 2023

The Faculty is delighted to announce that Dr Jane Rempel has been appointed to an Assistant Professorship in Classics from 1 September 2023. She is currently Lecturer in Classical Archaeology at the University of Sheffield.

Regius Professorship of Greek

16 January 2023

The Faculty is delighted to announce that Professor Tim Whitmarsh FBA has been elected Regius Professor of Greek from 1 April 2023. He is currently the A. G. Leventis Professor of Greek Culture in the University. Looking ahead to his new role, Professor Whitmarsh commented: ’I am thrilled and honoured to be taking up this...