University of Cambridge and MIT: Exploring Strategies for Digital Preservation for DSpace@Cambridge

Downing, Jim 
Carpenter, Grace 

Cambridge University Library and MIT Libraries submit this proposal to share the outcomes of the digital preservation research work conducted through the DSpace@Cambridge project, concentrating on two main areas: Process Automation and Preservation Planning.

Automation Digital preservation activity in its current form commonly involves a high level of human effort. In mediated archiving the archivist's efforts do not scale well. In self-archiving situations this effort can be a barrier to the adoption of the digital preservation activity. It behoves us, therefore, to look for opportunities to make this process more efficient through automation.

The potential for sustainable preservation of a digital object can be improved by accurate identification of the file's type, validation of the file against type specification, and technical metadata extraction. Recently available software (e.g. JHOVE) [1] provides identification and validation for a number of popular types, and there are older existing technologies (e.g. the 'file' command) that have some useful functionality in this area. DSpace@Cambridge will evaluate the different tools, investigate storage strategies for technical metadata, attempt to gauge the utility of certain types of technical metadata, and provide a technique for integrating these into institutional repositories.

Preservation Planning DSpace@Cambridge intends to develop strategy templates that will assist institutions with the preservation planning process. Building on work on format action plans done at the Florida Center for Library Automation as part of the [2] DAITSS project, we hope to create a system of machine readable preservation strategies that can evolve to support future rendering processes, and yet retain enough information that such processes can be human validated. Although the initial aim will be on migration, it is hoped that the technique can be extended to emulation and Universal Virtual Computer approaches. Our hope is to prove this preservation strategy approach by writing a migration tool for one or two formats capable of supporting migration on ingest or migration on-the-fly. It should be possible to share strategy templates between institutions.

