skip to primary navigationskip to content
 

Tagging

Every writing technology has advantages and drawbacks. The papyrus codex, for example, was (for its time) user-friendly and easy to search, because its architecture was rather similar to that of the world wide web — composed of a large number of short pages, rather than a continuous scroll.

It did, on the other hand, have drawbacks for copying, which was very labour-intensive, and also for permanent archiving, due to its fragility, which may be seen from this illustration of Charles Hedrick working on the Nag Hammadi codices.

palaeographer

Similar factors still apply in the modern world. Information flow is optimised when the message doesn't depend entirely on the medium, but can be translated across a variety of vehicles. For this reason, we are composing the lexicon, not with word-processing software, but using XML technology (the letters stand for 'extensible markup language'). This means that our pages are not just formatted for appearance, but also structured for print publication, and configured for online display and searching in electronic editions.

Here is a comparison of the two systems. First, a page of the lexicon composed using a word-processing program looks something like this:

p472word

This electronic typing standardises and preserves formatting quite well. However, extensive proof-reading has to be undertaken, and, in order to be translated into other media, the structure underlying the formatting also needs to be recorded. For example, the plain-text passages sometimes express the definition of the headword, but when bracketted, they may express an introductory or following explanatory remark, or encyclopaedic information, and these need to be identified if the lexicon is to be searched.

Such information can be preserved using XML 'tagging' (a development of the HTML which is used to format WWW pages). The basis of the system is an extended use of tags (labels inside pointed brackets) which in HTML are used to mark format: instead of typing a bold section of text by changing the font style, the passage is simply enclosed within "<bold>" tags.

In XML, the tags can define structure as well as format, and we can configure our own tags, so we can mark the headword or lemma by enclosing it in a specific tag. We can stipulate that this tag always marks the headword (a structural function), and that the text inside it is always in a bold Greek font (a formatting requirement). Similarly, we can tag the inflection, dialect forms, principal parts, definitions, and contextual information.

We have created over 100 different tags to cover every type of entry in the lexicon. Here is the start of the page shown above, now marked up in XML: 

p472xml

At first glance, this may look rather forbidding, but it soon becomes as natural as setting styles in a 'Word' document. We select the tags as we write. For example, the first entry, for libazomai, is enclosed in tags marked 'VE', because it is a 'verb entry'. Within that 'wrapper', there is a 'verbal head group' (vHG), which contains the lemma (HL), the part of speech label (PS), and the etymology (Ety). These elements may contain others within them, in a hierachical structure. And some of the elements are primarily there to facilitate searching: for example, inside the etymology tag, the related word libas is enclosed in 'Ref' tags, which indicate that it refers to another headword in the lexicon.

The definitions and translations appear inside 'S1' tags, and there are also 'S2' tags (not shown here) for subsections illustrating nuances of meaning. Within these 'S' elements are many others marking authors and contextual information, such as the subjects and objects which a verb takes, examples of nouns qualified by an adjective, or verbs modified by an adverb.

This level of precision means that we can immediately translate the page into print-quality format, using a PDF (portable document format) transformation. That gives us a more accurate picture of the finished product than is possible in a word-processing program. The same page of the lexicon looks like this in PDF format:

p472pdf

This is very close to the appearance of the final 'copy', so helps us to identify variations of style and content. There are other advantages too: because we have organised the tags within a specific structure, we are encouraged to be consistent in the way we write each entry, and so we can maintain a 'house style'.

We may sum up the advantages of XML authoring under five headings:

1: An integrated, flexible writing and publishing environment

We can cope with any technical problems which might arise as we proceed, and produce precise formatting for the typesetters, so the task of proof-reading will be greatly helped.

2: A consistent writing style

Inconsistencies are almost unavoidable in typed copy, and especially when articles are written by more than one person. For example, when citing Euripides Antiope, LSJ refers to "Antiop.iv B, [line number] A" and also "Antiop.iv B line ... Arn." and sometimes "Antiop.iv B line ... Arnim", or else "Antiop.p.21 A" or "Antiop.B 58 p.21 A". All these citations refer to the same fragment (fr.10 in Page's Select Papyri). Consistency could have been maintained if the authors of LSJ had been able to compare all their citations easily.

XML allows us to apply maximal constraints to entries, and so enables all the members of the editorial team to maintain consistent style and format.

3: A structure which reflects our methodology

Our aim has been to create structures which impose constraints on the writing, yet remain flexible enough to contain the range of information which we may wish to enter. We achieve this most importantly through the innovation of using dedicated structures for each part of speech. This enables us to maintain a balance between extended definitions, translation glosses, and contextual and encyclopaedic information, so that we are helped to write the last entries in the same style as we wrote the first.

4: A product which is translatable across publishing media

There will be an electronic edition on the Perseus site. The system means that it can be easily and accurately searched. Dictionaries which are tagged after they were written necessarily contain fewer tagged elements than ours (as they were not composed with such a precise structure), so fewer types of search are possible. A reader of our lexicon can, for example, see how vocabulary changes across the range of Ancient and Koiné Greek, because we mark usage in a corpus of 70 authors, from Homer to Plutarch, and so we can compare word frequency in different writers. And our system will also be linked to other Perseus databases, to images as well as to texts.

5: Better-organised material

XML releases us from the constraints on space of the printed book. Most usefully, our 'annotation' element allows us to incorporate editorial notes in each entry, for reference during the writing, and as a permanent archive of our research.

And cross-reference elements enable us to perform electronic searches during the writing and proof-reading stages. That has a number of advantages:

(a) We shall be able to group related words together, so we can easily compare all words sharing the same stem, and write the entry for a simple form before dealing with its derivatives. It is useful to compare the entries for all the compounds from the verb bainw (go), which can take the preverbs ana-, anti-, apo-, dia-, eis-, ek-, epi-, kata-, exana-, meta-, para-, peri-, poti-, pro-, pros-, sum-, huper-, and hupo-, rather than only treating them in alphabetical order.

(b) We can investigate the range of meanings of the prefixes themselves, across the different primary forms (as in the derivatives of bainw listed above), and compare this with their uses as independent prepositions and adverbs.

(c) We can incorporate cultural information. For example, colour terms constitute a group which is currently the subject of considerable semantic interest, and XML tagging enables us to study them not only in their primary forms, but also in compounds, where they may appear as stems, as in akro-kelainiown, with black surface; dia-melainw, become quite dark; hupo-glaukos, somewhat grey, huperuthros, rather red. They also appear as prefixes combined with a noun like aspis ('shield'): we find leuk-aspis ('white-shielded'), phoinik-aspis ('red-shielded'), chalk-aspis ('bronze-shielded'), and chrus-aspis ('gold-shielded').Attention to these details of word formation enables the writers to compose more precise definitions, which in turn can help the student gain a deeper understanding of Greek word meaning. Electronic searching during the writing process will help us produce a more consistent, coherent, and consequently more useful lexicon.

The XML environment is a little more challenging for the writers, because we have to become accustomed to manipulating the tags. However, it does save some effort, too, as we don't need to select bold or italic fonts, or to insert brackets, or even section numbers: all that is done automatically.

And the advantages are that mistakes and inconsistencies can be avoided, the writing and publication processes are integrated, and the usefulness of the lexicon can be maximised and extended in the future, as new ways of integrating verbal and visual information are discovered. We believe that this will help students to explore the richness of the Ancient Greek vocabulary in the most effective way possible.

RSS Feed Latest news

'The Impact of the Ancient City': PhD Studentship

Dec 02, 2016

Applications are invited for a 3-year fully-funded PhD studentship in the context of the ERC Advanced Grant project, 'The Impact of the Ancient City', supervised by Professor Andrew Wallace-Hadrill.

Mary Beard addresses Italian parliament

Nov 24, 2016

Mary Beard addresses the Italian parliament (Camera dei Deputati) on the subject of violence against women in Ancient and Modern times, Rome 22 November 2016.

Curator (Maternity Cover)

Nov 18, 2016

The Museum of Classical Archaeology is seeking to appoint a temporary Curator from February 2017. See the Jobs and Vacancies page for further information.

Leverhulme Early Career Fellowship Scheme 2017

Nov 08, 2016

The Classics Faculty welcomes enquiries from potential applicants for the Leverhulme Early Career Fellowship Scheme 2017. The scheme provides prestigious three-year fellowships for early career researchers within four years of their PhD viva.

View all news