April 2009

New Project: Keeping Research Data Safe 2

I am pleased to report that Charles Beagrie Ltd will be the lead contractor for Keeping Research Data Safe 2: a  new JISC-funded study of the identification of long-lived digital datasets for the purposes of cost analysis.

The study aims to build on the work from the original Keeping Research Data Safe consultancy and is being undertaken by a consortium consisting of 4 partners involved in the original  study (University of Cambridge, Charles Beagrie Ltd, OCLC, and University of Southampton) and 4 new partners (the Archaeology Data Service, University of Oxford, UK Data Archive, and University of London Computer Centre) with significant data collections and interests in preservation costs. All the partners bring considerable relevant expertise, knowledge and resources to the project.

The new study will identify and analyse sources of long-lived data and develop longitudinal data on associated preservation costs and benefits. We believe these outcomes will be critical to developing preservation costing tools and cost benefit analyses for justifying and sustaining major investments in repositories and data curation.

The project will utilise the Keeping Research Data Safe cost framework as a tool for organising and scoping its work. We will undertake a combination of desk research; a data survey; analytical work with national and disciplinary digital archives that have existing historic cost data for preservation of digital research data; and interaction with digital archives in research universities who have little or no historic cost data but a strong interest in this study and identifying criteria and metrics for capturing cost data going forward.

A project website will be available shortly and regular updates on the study will be posted to this blog.

A Future Combination of PRONOM and GDFR?

An interesting emerging digital preservation development is the Unified Digital Formats Registry (UDFR) combining efforts from the UK National Archive’s PRONOM service and Harvard University’s Global Digital Formats Registry (GDFR).

THE GDFR website notes in April 2009 the GDFR initiative joined forces with the UK National Archives’ PRONOM registry initiative under a new name – the Unified Digital Formats Registry (UDFR). The UDFR will support the requirements and use cases compiled for GDFR and will be seeded with PRONOM’s software and formats database. A new website is being constructed for the UDFR and will be available at www.udfr.org.

To quote from the UDFR Proposal and Roadmap:

” There are two major efforts underway to create a format registry with complimentary strengths and weaknesses. PRONOM, created by The National Archives (TNA) in the UK, has a strong technological base, and has been building a database of original information about various digital formats. PRONOM at this point however is owned and maintained by a single organization, making it vulnerable to changes in that institution. The Global Digital Formats Registry (GDFR) effort, hosted by Harvard University, has developed a model for a registry based on shared governance, cooperative data contribution, and distributed data hosting. However, GDFR is technically less far along in development, and has not yet begun database building.

Given the paucity of resources in the digital preservation community it would be highly unfortunate if these efforts were to compete for resources. Therefore a group of involved and interested institutions have agreed to join together to create a single shared formats registry drawing on the individual strengths of the two existing efforts. The initiative would:

  • be technically based on the existing PRONOM system and database;
  • create a community governance model for the registry involving all institutions willing to contribute to its development;
  • develop a mechanism for the distribution of the registry data in such a way as to support local extensions and additions to the database;
  • develop both technical and organizational support for distributed input to the registry, including some form of quality vetting of contributed data.”

Further details of the proposal are available from the GDFR website.