Repository logo
 

Digital preservation at Big Data scales: proposing a step-change in preservation system architectures


Type

Article

Change log

Authors

Mooney, JE 
Thompson, D 

Abstract

jats:sec <jats:title content-type="abstract-subheading">Purpose</jats:title> jats:pThe purpose of this paper is to consider how digital preservation system architectures will support business analysis of large-scale collections of preserved resources, and the use of Big Data analyses by future researchers.</jats:p> </jats:sec> jats:sec <jats:title content-type="abstract-subheading">Design/methodology/approach</jats:title> jats:pThis paper reviews the architecture of existing systems, then discusses experimental surveys of large digital collections using existing digital preservation tools at Big Data scales. Finally, it introduces the design of a proposed new architecture to work with Big Data volumes of preserved digital resources – also based upon experience of managing a collection of 30 million digital images.</jats:p> </jats:sec> jats:sec <jats:title content-type="abstract-subheading">Findings</jats:title> jats:pModern visualisation tools enable business analyses based on file-related metadata, but most currently available systems need more of this functionality “out-of-the-box”. Scalability of preservation architecture to Big Data volumes depends upon the ability to run preservation processes in parallel, so indexes that enable effective sub-division of collections are vital. Not all processes scale easily: those that do not require complex management.</jats:p> </jats:sec> jats:sec <jats:title content-type="abstract-subheading">Practical implications</jats:title> jats:pThe complexities caused by scaling up to Big Data volumes can be seen as being at odds with preservation, where simplicity matters. However, the sustainability of preservation systems relates directly to their usefulness, and maintaining usefulness will increasingly depend upon being able to process digital resources at Big Data volumes. An effective balance between these conflicting situations must be struck.</jats:p> </jats:sec> jats:sec <jats:title content-type="abstract-subheading">Originality/value</jats:title> jats:pPreservation systems are at a step-change as they move to Big Data scale architectures and respond to more technical research processes. This paper is a timely illustration of the state of play at this pivotal moment.</jats:p> </jats:sec>

Description

Keywords

Big Data, Architecture, Digital preservation, Image processing, Business analytics, Digitization

Journal Title

Library Hi Tech

Conference Name

Journal ISSN

0737-8831

Volume Title

36

Publisher

Emerald
Sponsorship
The Polonsky Foundation