Our work on the 'slips' database, and on writing in XML, has involved us in more fundamental research into text manipulation and storage, which we undertook in 2002-2005, as a participant in
This international research project was devoted to international digital library technology, and specifically the development of:
- an infrastructure for digital libraries;
- IT tools for end-users that are designed to be adaptable to different uses;
- a framework for sharing metadata, data, and tools across multiple digital libraries;
- a distributed archive allowing for long-term preservation of, and easy access to, digital data.
The participants (four European and four U.S.) are listed here, with their specialist areas and researchers:
- Imperial College London, Centre for the History of Science, Technology, and Medicine,
(Newton's papers in Latin) —Rob Iliffe and Scott Mandelbrote, (Multimedia digital libraries) —Stefan Rueger
- Classics Faculty, Cambridge University (Greek lexicography, text markup, XML development) —Bruce Fraser
- Istituto di Linguistica Computazionale del CNR, Pisa,
(classical text archive; text processing and imaging) —Andrea Bozzi
- Arnamagnaeanske Institut, Copenhagen University (Old Norse) —Matthew Driscoll
- The Stoa Consortium, University of Kentucky —Ross Scaife
- University of Missouri at Kansas City, Department of English (database design and programming) —Jeff Rydberg-Cox
- Perseus Project, Tufts University (text archives and display, morphological analysis) —Gregory Crane
- UCLA, Institute for Social Science Research (Old Norse) —Timothy Tangherlini
Each of the participants brought to this project a digital collection that when linked together created a mini international digital library which acted as a test-bed for the creation of structural models and computing software. The work undertaken by each of the partners has helped to develop an 'infrastructure model' for digital libraries. And now that the model is operational at each of the partner sites, the individual 'work packages' are still proceeding in tandem, within a unified working environment.
All these work packages are organised around a series of specialised digital library applications that have been integrated into a single system —three of them involving the use of corpora as test beds for new applications. The methodology relies upon the development of an 'indexing architecture' that can be applied to a range of languages. This allows us to apply the same tools to every text in the system.
Although we brought existing corpora to the project, the infrastructure that we have developed will also allow us to integrate other texts at a very low cost. And so the corpora that we created and integrated into our systems can make a substantial contribution to future research on our shared linguistic heritage.