Digital Curation

Scholarly Journals introduce Supplementary Data Archiving Policy

An important editorial has just appeared online in the February issue of The American Naturalist.
To promote the preservation and fuller use of data, The American Naturalist, Evolution, the Journal of Evolutionary Biology, Molecular Ecology, Heredity, and other key journals in evolution and ecology will soon introduce a new data archiving policy. The policy has been enacted by the Executive Councils of the societies owning or sponsoring the journals. For example, the policy of The American Naturalist will state:

This journal requires, as a condition for publication, that data supporting the results in the paper should be archived in an appropriate public archive, such as GenBank, TreeBASE, Dryad, or the Knowledge Network for Biocomplexity. Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future. Authors may elect to have the data publicly available at time of publication, or, if the technology of the archive allows, may opt to embargo access to the data for a period up to a year after publication. Exceptions may be granted at the discretion of the editor, especially for sensitive information such as human subject data or the location of endangered species.

This policy will be introduced approximately a year from now, after a period when authors are encouraged to voluntarily place their data in a public archive. Data that have an established standard repository, such as DNA sequences, should continue to be archived in the appropriate repository, such as GenBank. For more idiosyncratic data, the data can be placed in a more flexible digital data library such as the National Science Foundation–sponsored Dryad Archive.

Authors of the editorial, Michael C. Whitlock, Mark A. McPeek, Mark D. Rausher, Loren Rieseberg, and Allen J. Moore present the case for the importance of data archiving in science.   This is the first of several coordinated editorials soon to appear in major journals.

New Charles Beagrie Projects for 2009/2010

We are starting up and partnering in a number of new and interesting consultancy projects which run into 2010 as follows:

Dryad is an emerging digital repository for supplementary data underlying published works in ecology, evolution, and related fields being developed by a consortium of the National Evolutionary Synthesis Center (NESCent) in the US and relevant scientific societies and academic journals. Its goals are to:

  • – preserve all the underlying data reported in a paper at the time of publication, when there is the greatest incentive and ability for authors to share their data. This is particularly important in the case of data for which a specialized repository does not exist.
  • – lower the burden of data sharing by providing one-stop data-deposition via handshaking with specialized repositories.
  • – assign globally unique identifiers to datasets, thus enabling data citations.
  • – allow end-users to perform sophisticated searches over data (not only by publication, but also by taxon, geography, geological age, biological concept, etc).
  • – allow journals and societies to pool their resources for one shared repository.
  • – enable bidirectional search and retrieval with data repositories from related disciplines.

The strategic priorities for Dryad emerged from a May 2007 workshop on “Data Preservation, Sharing, and Discovery: Challenges for Small Science in the Digital Era“, at which a variety of stakeholder journals and societies were represented.

I am pleased to announce that Charles Beagrie Limited will be working with the Dryad project team to develop a business plan and sustainability for the Dryad repository. Neil Beagrie and Julia Chruszcz will lead the consultancy with research support from Peter Williams. Further information on Dryad, the partners and the latest developments can be found on the Dryad website.

I2S2 – The  Infrastructure for Integration in Structural Sciences (I2S2) Project  is funded under the Research Data Management Infrastructure strand of the JISC’s Managing Research Data Programme, with a duration of 18 months (Oct 2009 to March 2011). It will identify requirements for a data-driven research infrastructure in “Structural Science”, focussing on the domain of Chemistry, but with a view towards inter-disciplinary application.

Two research data management pilots  will examine the business processes of research, and highlight the benefits of an integrated approach. Both pilots will address traversing administrative boundaries between institutions to national facilities in addition to issues of scale (local laboratory to national facilities, DIAMOND synchrotron and ISIS respectively).

A key component of the infrastructure will be a harmonised Integrated Information Model to include all stages of the Data Life Cycle. A “before and after” cost-benefit analysis will be performed using the Keeping Research Data Safe (KRDS2) model, which will be extended to address specific requirements in I2S2. We are looking forward to working with UKOLN (University of Bath and DCC), The Universities of Southampton and Cambridge, and the Science and Technology Facilities Council (STFC) in the project.

Keeping Research Data Safe2 Survey exemplar added to project webpage

The completed response from the eCrystals repository at the University of Southampton to the KRDS2 Survey has been added to the project webpage.

So far around 12 organisations from the UK and internationally have responded to the Survey. The eCystals response has been added to the project webpage as an exemplar for those still considering a response and for anyone interested in the information the Survey will contain.

Further information on the KRDS2 Survey are contained in an earlier blog posting on the Survey and on the project webpage.  KRDS2 invite you to contribute to the Survey if you have research datasets and associated cost information that you feel may be of interest to the study.

We anticipate that no organisation will have complete information on costs but most will have cost information in some areas. The aim of the survey is to compile an overview of what preservation cost information is collected.

The Survey proforma is available to download as an Acrobat form (requires Adobe Reader 8+ installed) or a Word form (requires Microsoft Word installed). The Survey proforma is available as a single main questionnaire or alternatively if you have multiple cost datasets you can complete a separate organisational cover sheet and multiple collection details as required. It should take less than 30 minutes to complete and KRDS2 is seeking responses (to info@beagrie.com) by the end of October 2009.

Keeping Research Data Safe2: Data Survey added to project website

The Keeping Research Data Safe2 project (KRDS2) commenced on 31 March 2009 and will complete in December 2009. The project is identifying long-lived datasets for the purpose of cost analysis (including social sciences and humanities research) and is building on the work of the first “Keeping Research Data Safe” study completed in 2008.

We are currently undertaking detailed analysis of available cost information from 3 of our project partners and aim to develop guidance for how cost metrics can be captured and applied in future from this.

In addition we have now added a survey proforma to the project website to help us identify other research data collections with information on preservation costs and issues. We invite you to contribute to the data survey if you have research datasets and associated cost information that you feel may be of interest to the study.

We anticipate that no organisation will have complete information on costs but most will have cost information in some areas. The aim of the survey is to compile an overview of what preservation cost information is collected.

The Survey proforma is available to download as an Acrobat form (requires Adobe Reader 8+ installed) or a Word form (requires Microsoft Word installed). It should take less than 30 minutes to complete and we are seeking responses (to info@beagrie.com) by the end of October 2009.  The Survey proforma is available as a single main questionnaire or alternatively if you have multiple cost datasets you can complete a separate organisational cover sheet and multiple collection details as required. Please do not hesitate to contact us at info@beagrie.com if you have any difficulty or questions.

Just Published: Survey of Researchers’ Views on Research Data Preservation and Access

The latest Volume of Ariadne (issue 60 July 2009) publishes an article based on recent work by Charles Beagrie Limited and Serco Consulting for the UK Research Data Service (UKRDS) Feasibility Study. It should be of interest to an international as well as UK audience as may of the issues addressed apply to research and research data  issues in any national context.

Research Data Preservation and Access: The Views of Researchers present findings from a UKRDS survey of researchers’ views on and practices for preservation and dissemination of research data in four UK universities (Bristol, Leeds, Leicester, and Oxford) and place them in the wider UK and international context.

A preliminary report from the Survey was included in the UKRDS Interim Report . Elements of the Survey and its findings were also incorporated in the Final Report of the UKRDS Feasibility Study submitted to HEFCE . However space constraints precluded presentation of all the data and findings in full in these reports and they were mainly included in a separate unpublished appendix. This article therefore aims to publish more of this material and set it in its context  with updates from more recent published studies.

Keeping Research Data Safe 2 – Project webpage and project plan now available

The project plan and project webpage for the JISC-funded Keeping Research Data Safe 2 project (KRDS2) are now available on the Charles Beagrie website. The webpage has been set-up to support dissemination of information on the project and provide the background to the work, details of the project partners, and the project plan.

The first Keeping Research Data Safe study funded by JISC made a major contribution to the study of preservation costs by developing a cost model and indentifying cost variables for preserving research data in UK universities.

KRDS2 aims to extend this previous work on digital preservation costs. It is identifying long-lived datasets for the purpose of cost analysis and building on the work of the first “Keeping Research Data Safe” study completed in 2008.

The KRDS2 project commenced on 31 March 2009 and will complete in December 2009. For further information see  the project plan.

UK Research Data Service (UKRDS) International Conference

160 people gathered today at the Royal Society at the one day international conference on the UK Research Data Service (UKRDS) Feasibility Study.

The eight page management summary from the final report has been made available on the UKRDS website to co-incide with the conference. This recommends to HEFCE that the UKRDS is feasible and should be funded over a period of at least 5 years. In the first instance it recommends a 2-year Pathfinder phase should be funded at a cost of £5.31m.  It estimates overall savings delivered by a scaled-up UKRDS service to be the financial equivalent of 63.5 FTEs over a period of five years.
You can also find the presentations from the day available online.

HEFCE is still considering the report but it said to regard it favourably. A final decision is awaited.

New International Society for Biocuration launched

A potentially important development in digital curation is the creation of a new International Society for Biocuration.

The mission of the Society will be to:

1. Define the work of biocurators for the scientific community and the public funding agencies;
2. Propose a discussion forum for interested biocurators, developers, scientists and students.
3. Organize a regular meeting where biocurators will be able to present their work and discuss their projects.
4. Lobby to obtain increased and stable funding for biocuration resources that are essential to research;
5. Build a relationship with publishers and establish a link between researchers and databases through journal publishers
6. Organize a regular workshop where new biocurators, or interested students can be trained in the use of the common tools needed for their work.
7. Provide documentation on the use of common database and bioinformatics tools.
8. Provide ‘Gold Standards’ for databases, such as the use of unique, traceable identifiers, use of shared tools, etc.;
9. Share documentation on standards and annotation procedures with the aim of developing Standard Operating Procedures (SOPs).
10. Foster connections with user communities to ensure that databases and accompanying tools meet specific user needs;
11. Maintain a biocurator job market forum.

The new Society will have its official launch at the 3rd International Biocuration Conference 16-19 April 2009 in Berlin.

A Digital Preservation Workshop at The Hague, April 2009

The Dutch National Library (the KB), The Ligue des Bibliothèques Européennes de Recherche (Association of European Research Libraries – LIBER), and the Dutch Digital Preservation Coalition (NCDD) are holding a digital preservation workshop titled e-Merging New Roles and Responsibilities in the European Landscape on 17 April at the KB, The Hague, Netherlands.

The workshop aims to develop a basic understanding of the issues presented by long-term digital curation and preservation of resources which are (to be) deposited in institutional and subject-based repositories – both within research institutions and research communities. It will highlight the state of the art in digital curation and will cover best practices, including possibilities for outsourcing.

I will be chairing the afternoon session on “Policy, preconditions and costs: opportunities and pitfalls in long-term digital preservation”  with Marcel Ras, Head of the e-Depot at the KB. Attendees registering for the workshop have the opportunity to  list a specific question or problem they would like to see covered, so the session content will be tailored to your suggestions! For further information see the workshop webpages linked above.

ComputerWeekly tips digital preservation as an emerging technology

Digital Preservation has been tipped as an emerging technology to watch by a leading IT magazine.

Yesterday’s ComputerWeekly has an  article in its IT Management section on How to beat the recession using underutilised technology by Michael Pincher. It focuses on how IT vendors can look at emerging technologies and customer requirements to innovate and begin to buck the recession.

Its an interesting article looking at overlooked areas of corporate innovation, key markets, “hype cycles”, and emerging technologies.

The emerging technologies section particularly caught my eye mentioning that digital preservation is a growth area in data management. In addition related issues such as regulatory compliance technologies, content management and repositories, infrastructure protection, storage management, and risk management are highlighted.

The list of emerging technologies is provided to give food for thought and help advise on business and innovation potential in the marketplace. The content of the article however should be of interest to a much wider readership and I highly recommend reading it.

« Prev - Next »