Approaches towards the Long Term Preservation of Archival Digital Records
Background
The main problem surrounding the preservation of authentic electronic records is that of technology obsolescence. As changes in technology continue to increase exponentially, the problem arises of what to do with records that were created using old and now obsolete hardware and software. Unless action is taken now, there is no guarantee that the current computing environment (and thus also records) will be accessible and readable by future computing environments. In 1999, the research report 'Digital Preservation: Carrying Authentic, Understandable and Usable Digital Records Through Time' was drawn up. In this report the possibilities of a few technologies and approaches were explored for the long term preservation of digital records. The report shows that it is not yet possible to make a responsible choice from these technologies and approaches. According to the recommendations of the researchers, the Ministry of the Interior and Kingdom Relations and the Ministry of Eductation, Culture and Sciences (the National Archives) decided to establish a 'Testbed' to gain the essential knowledge and experience. The Digital Preservation Testbed is carrying out experiments according to pre-defined research questions to establish the best preservation approach or combination of approaches.
The Testbed will be focusing its attention on three different digital preservation approaches - Migration; Emulation; and XML - evaluating the effectiveness of these approaches, their limitations, costs, risks, uses, and resource requirements.
Migration
Migration is currently the most widely adopted short-term approach to digital preservation, yet it is also the one that appears to attract the most criticism. Digital Migration is defined within the scope of the Testbed as the transfer of record(s) from one hardware/software configuration/platform to another . A simple example of this would be the migration of a record from Word 6 to Word 7; a more complex example would be the migration of a record from a Macintosh OS to a Windows OS.
The Migration approach receives a lot of criticism as results are often unpredictable, due mainly to a lack of testing and documentation. When new software is brought out, it is common for many people to simply ‘update’ their documents. Yet this often results in the loss of information, whether that be record content, format, behaviour, or appearance. The new application reads the record in a different manner from that in which it was designed to be read, and during the migration process, some processing instructions, content, and functionality, may be lost or even gained. The loss of this information varies, depending on the extent and nature of the migration performed. Migration results are difficult to predict, unless a substantial amount of work is done in advance on the source and target format specifications . Migration can affect a records status as authentic, and any record which is preserved must be preserved authentically, otherwise it’s meaning and validity cannot be assured . This has legal as well as archival implications.
The Testbed will be experimenting with the migration of text documents, spreadsheets, databases and e-mails of different size, format, nature and complexity, and will track the changes produced in the records through the process of migration to assess the extent to which it is an effective means of preservation. Many people criticise migration as a preservation technique because of the changes to the record that it often unintentionally makes, but few have so far provided research results concerning the precise nature of the changes in a large scale environment.
Emulation
The theory behind Emulation as a Digital Preservation approach is that digital documents are inherently software-dependent, regardless of their format. Emulation proposes by-passing the problem of hardware/software obsolescence by enabling the recreation of the old software and the environment needed to run it inside of new and future hardware. By preserving not only the record but also the software on which it was written and originally intended to be run, the record will not undergo any changes and its preservation and authenticity can be assured.
There exist differing theories on how emulation as a digital preservation approach should be carried out. Jeff Rothenberg, whose report to the Ministry of the Interior in 1999 formed the basis of the testbed project, proposes the writing of an emulator specification program at the time of platform obsolescence which can be activated in the future when access to the records is required. IBM, on the other hand, are currently working on the creation of a UVC (Universal Virtual Computer) on which files can be read using ‘views’ instead of the original software. Therefore the dependancy on unknown future generations of hardware will not exist, and also the emulation specifications will be much simpler.
Emulation has attracted similar criticisms to migration, on the grounds that it can be costly, highly technical, and labour intensive. The criticism is not always justified - as there is currently no tried and tested specific methodology for emulation, the future costs cannot yet be predicted and may or may not turn out to be less than for migration. This is a situation to which the Testbed should be able to contribute, through the results of the research. The Testbed will not be designing and building an Emulator of its own, but will be working closely with others. on this approach. Again, a range of document types will be tested: text, spreadsheets, databases and e-mail, testing and identifying the true potential of emulation as a digital preservation approach.
XML
The Testbed will also be researching the potential of what has been termed 'The XML Approach'. XML stands for Extensible Markup Language, and has been hailed as the silver bullet for information storage and processing. XML is platform independent, which allows for easy transfer of information sets from one machine to another, without having to worry if the recipient of the information has the same software applications to open the document as the originator. This has positive implications for XML as a longer-term storage format.
The term XML is used to refer to the language and file format. The project will be utilising XML in various forms, including the possibility of using XML to associate preservation metadata with records. However, the main interest lies in XML’s capabilities as a storage format for preservation. Another project currently working with aspects of XML is the NARA/SDCC collaboration.
Metadata & Authenticity
Metadata, to put it simply, means 'data about data'. There is a strong tradition of metadata utilisation in record-keeping and archival professions, but the term 'metadata' really only came into prominence within the archival profession with the emergence of digital records and their associated problems in the 1990's. A useful definition of metadata is provided by Professor A J Gilliland-Svetland: 'the sum total of what one can say about any information object at any level of aggregation', encompassing information about the record with regards to content, context and structure (and often in the case of digital records, also behaviour and appearance). Library catalogues, for example, provide us with metadata about publications, such as author, title, date of publication, and ISBN number. Digital archival records, however, require a much more extensive set of metadata, so that the record can be confirmed as authentic.
Authenticity is a pre-requisite for any record - if a record cannot be proved beyond reasonable doubt to be authentic, then its contents cannot be verified and the information it contains cannot be guaranteed to be consistent with what it contained at point of capture. Appropriate metadata provided about a record at its point of ingest into a Depot/Archive is relied upon to prove the record’s authenticity. Different records have differing authenticity requirements. Document originators generally provide the best indication of whether their records have any specific authenticity requirements, for example, the use of PKI. They are also the source for much of the non-technical metadata about the record, such as the business process. Authenticity requirements and metadata requirements for digital preservation are inextricably linked and are vital to any digital preservation approach.
The Testbed will be conducting research to discover if there is an inviolable way to associate metadata with records and to assess the limitations such an approach may incur. We are also working on the provision of a proposed set of preservation metadata that will contain information about the preservation approach taken and any specific authenticity requirements .
Further Reading:
MIGRATION:
Risk Management of Digital Information: A File Format Investigation, Lawrence, Gregory W et al. http://www.clir.org/pubs/reports/pub93/pub93.pdf
Migration: A CAMiLEON Discussion Paper Holdsworth, David.
http://www.personal.leeds.ac.uk/~issprw/camileon/migration.htm
Farewell my Floppy: A strategy for migration of digital information, Woodyard, Deborah
http://www.nla.gov.au/nla/staffpaper/valadw.html
EMULATION:
Reality and Chimeras in the Preservation of Electronic Records, Bearman, David.
http://www.dlib.org/dlib/april99/bearman/04bearman.html
Emulation as a Digital Preservation Strategy, Granger, Stewart. http://www.dlib.org/dlib/october00/granger/10granger.html
Emulation, Preservation & Abstraction Holdsworth, David & Wheatley, Paul.
http://129.11.152.25/CAMiLEON/dh/ep5.html
Avoiding Technological Quicksand: Finding a Viable Technical Foundation for Preserving Digital Information Rothenberg, Jeff. http://www.clir.org/pubs/reports/rothenberg/contents.html
XML:
Persistent Object Preservation: Advanced computing infrastructure for Digital Preservation Thibodeau, Ken
http://europa.eu.int/ISPO/dlm/dlm99/Proceed99-down_en.htm
METADATA & AUTHENTICITY:
Introduction to Metadata: Setting the Stage, Anne J Gilliland Swetland. http://www.getty.edu/research/institute/standards/intrometadata/2_articles/index.html
Authenticity in a Digital Environment, Cullen, Charles T et al. http://www.clir.org/pubs/reports/pub92/contents.html
I’m me and you’re you but is that that? Ashley, Kevin
http://www.rlg.org/events/pres-2000/ashley.html
Preservation Metadata for Digital Objects: A Review of the State of the Art OCLC/RLG White Paper
http://www.oclc.org/research/pmwg/presmeta_wp.pdf