e-Research

Just published: A Comparative Study of e-Journal Archiving Solutions

I am pleased to announce that the JISC-funded report A Comparative Study of e-Journal Archiving Solutions has just been published and is now available to download as a pdf from the JISC Collections website. It has been a great pleasure to work with Julia Chruszcz, Maggie Jones and Terry Morrow on this study over the last few months.

The report is the result of a call by the JISC, issued in January 2008, for a Comparative Study of e-Journal Archiving Solutions. The Invitation to Tender asked for a report that “will be published for wide use by institutions to inform policies and investment in e-journal archiving solutions.” The ITT also stated that the report should “also inform negotiations undertaken by JISC Collections and NESLi2 when seeking publishers’ compliance to deposit content with at least one e-journal archiving solution.”

The report contains chapters covering: Approaches to e-journal preservation, Publisher licensing and legal deposit, Comparisons of Six Current e-Journal Archiving Programmes (LOCKSS, CLOCKSS, Portico, the KB e-depot, OCLC’s Electronic Collections Online, and the British Library’s e-journal Digital Archive), Practical experience of e-journal archiving solutions, Evaluation of four common scenarios/trigger events, and Criteria for judging relevance and value of new archiving initiatives. There are two appendices on Publisher Participation in different programmes.

The report has the following recommendations:

  1. When negotiating NESLi2 agreements, JISC’s negotiators should take the initiative by specifying archiving requirements, including a short-list of approved archiving solutions.
  2. To help quantify the insurance risk and the necessary appropriate investment, bodies representing publishers and other trade organisations should gather and share statistical information on the likelihood of the trigger events outlined in this report.
  3. Post cancellation access conditions should be defined in the licensing agreement between libraries and publishers. Publishers should be strongly encouraged to cooperate with one or more external e-journal archiving solutions as well as provide their own post-cancellation service (at minimal cost).
  4. The publisher (or subscription agent) should state their policy on perpetual access under the four scenarios described in section 9.
  5. When titles are sold on to other publishers, the Transfer Code of Practice (see section 9.3.) should be followed.
  6. Archiving service providers and publishers should work together to develop standard cross-industry definitions of trigger events and protocols on the conditions for release of archived content. Project Transfer is a potential exemplar. The ground rules for any post-trigger event negotiation should be clear and transparent and established  in advance.
  7. Archive service providers must provide greater clarity on coverage details, including not only publishers and titles, but also the years and issues included in the archive.
  8. Using the scenarios outlined in this report, libraries should carry out a risk assessment on the impact of loss of access to e-journals by their institution, and a cost/benefit analysis, in order to judge the value and relevance of the archiving solutions on offer.
  9. Relevant UK bodies and institutions should use whatever influence they can bring to bear to ensure that archiving solutions cover publishers and titles of particular value to UK libraries.
  10. The findings of this study should be reviewed and updated at regular intervals to reflect continuing developments in the field of e-journal archiving and preservation.

Its publication comes hot on the heels of two related studies  the Portico/Ithaka e-journal archiving survey of US Library Directors  and the JISC-funded UK LOCKSS Pilot Programme Evaluation Report. A further blog entry will follow!

just published: Research Data Preservation Costs Report

I have posted two previous entries to the blog in March and January detailing progress with the JISC-funded research data preservation costs study. I am pleased to report that the online executive summary and full report (pdf file) titled “Keeping Research Data Safe: a cost model and guidance for UK Universities” is now published and can be downloaded from the JISC website.

It has been an very intensive piece of work over four months and I am extremely grateful to the many colleagues who contributed and made this possible. We have uncovered a lot of valuable data and approaches and hope this can be built on by future studies and implementation and testing. We have attempted to “show our workings” as far as possible to facilitate this so  the text of the report is accompanied by extensive appendices.

We have made 10 recommendations on future work and implementation. For further information see the Executive Summary online.

The report iteself has chapters covering the Introduction, Methodology, Benefits of Research Data Preservation, Describing the Cost Framework and its Use, Key Cost Variables and Units,the Activity Model and Resources Template, Overviews of the Case Studies, Issues Universities Need to Consider, Different Service Models and Structures, Conclusions and Recommendations. There are also four detailed case studies covering the Universities of Cambridge, King’s College London, Southampton, and the Archaeology Data Service (University of York).

Although focused on the UK and UK universities in particular, it should be of interest to anyone involved with research data or interested generally in the costs of digital preservation.

 

Comments and Feedback welcome!

Research Data and the Computing Cloud: NSF/Google and IBM

Research in the Cloud: Providing Cutting Edge Computational Resources to Scientists is an interesting recent post to the Google Research Blog. It provides Google’s take on its participation in the National Science Foundation/Google/IBM collaboration within The Cluster Exploratory Program (CluE).

The NSF solicitation for proposals was released last week. To quote from the call:
“In addition to the widespread societal impact of data-intensive computing, this computational paradigm also promises significant opportunities to stimulate advances in science and engineering research, where large digital data collections are increasingly prevalent. Well-known examples include the Sloan Digital Sky Survey, the Visible Human, the IRIS Seismology Data Base, the Protein Data Bank and the Linguistic Data Consortium, however other valuable data collections or federations of data collections are being assembled on an ongoing basis. In many fields, it is now possible to pose hypotheses and test them by looking in databases of already collected information.   Further, the possibility of significant discovery by interconnecting different data sources is extraordinarily appealing. In data-intensive computing, the sheer volume of data is the dominant performance parameter.  Storage and computation are co-located, enabling large-scale parallelism over terabytes of data. This scale of computing supports applications specified in high-level programming primitives, where the run-time system manages parallelism and data access. Supporting architectures must be extremely fault-tolerant and exhibit high degrees of reliability and availability.
The Cluster Exploratory (CluE) program has been designed to provide academic researchers with access to massively-scaled, highly-distributed computing resources supported by Google and IBM.  While the main focus of the program is the stimulation of research advances in computing, the potential to stimulate simultaneous advances in other fields of science and engineering is also recognized and encouraged.”

It should be interesting to see how this collaboration evolves and the datasets it includes. For more information see the The Cluster Exploratory (CluE) program call text.

OR2008 - Presentations available

 

The Open Repositories conference (OR2008) repository is available at http://pubs.or08.ecs.soton.ac.uk/ as a permanent record of the conference activities.

The repository contains papers, presentations and poster artwork for 144 different conference contributions from the main conference sessions (Interoperability, Legal, Models, Architectures & Frameworks, National Perspectives, Scientific Repositories, Social Networking, Sustainability, Usage, Web 2.0), the Poster session, User Group sessions (DSpace, EPrints, Fedora), Birds of a Feather sessions, the Repository Managers session and the ORE Information day.

My powerpoint presentation from the Plenary keynote for the Fedora International Users’ Meeting is also available there. Titled “Keeping alert: issues to know today for long-term digital preservation with repositories” it focussed on research data and sustainability. It drew heavily from the forthcoming JISC Research Data Preservation Costs study and the draft final report titled “Keeping Research Data Safe: A Cost Model and Guidance for UK Universities”. It concludes by outlining tentative findings and implications for repositories from that report.

Digital Preservation Cost Models

I blogged back in January on the JISC Research Data Preservation Costs study and promised an update at the end of March. Well the draft final report titled “Keeping Research Data Safe: A Cost Model and Guidance for UK Universities” is now with JISC and being peer-reviewed.

It’s been a significant effort and I think it should be a major contribution to thinking on digital preservation cost models and costs in general – hopefully the final report will be out later this Spring.  In short we have produced:

• A cost framework consisting of:

o A list of key cost variables divided into economic adjustments (inflation/deflation, depreciation, and costs of capital), and service adjustments (volume and number of deposits, user services, etc);

o An activity model divided into pre-archive, archive, and support services;

o A resources template including major cost categories in TRAC ( a methodology for Full Economic Costing used by UK universities); and divided into the major phases from our activity model  and by duration of activity.

Typically the activity model will help identify resources required or expended, the economic adjustments help spread and maintain these over time, and the service adjustments help identify and adjust resources to specific requirements. The resources template provides a framework to draw these elements together so that they can be implemented in a TRAC-based cost model. Normally the cost model will implement these as a spreadsheet, populated with data and adjustments agreed by the institution.

The three parts of the cost framework can be used in this way to develop and apply local cost models. The exact application may depend on the purpose of the costing which might include: identifying current costs; identifying former or future costs; or comparing costs across different collections and institutions which have used different variables. These are progressively more difficult. The model may also be used to develop a charging policy or appropriate archiving costs to be charged to projects.

In addition to the cost framework there are:

• A series of case studies from Cambridge University, King’s College London, Southampton University, and the Archaeology Data Service at York University, illustrating different aspects of costs for research data within HEIs;

• A cost spreadsheet based on the study developed by the Centre for e-Research King’s College London for its own forward planning and provided as a confidential supplement to its case study in the report;

• Recommendations for future work and use/adaptation of software costing tools to assist implementation.

Watch this space (well blog) for a future announcement of the final report and url for the download.

First African Digital Curation Conference

Most digital curation and preservation news seems to come from Europe and North America so it is interesting to see emerging interest in digital curation and digital preservation issues in the developing economies. With that in mind I’m flagging up the first African Digital Curation Conference held in Pretoria on 12-13 February which concluded today. The conference was organised under the auspices of the South African Department of Science and Technology, three science councils (the CSIR, the Human Sciences Research Council and the National Research Foundation), the University of Pretoria and the Academy of Science of South Africa.

The conference programme looks interesting.

During the first day, international speakers shared perspectives mainly from the UK, the European Union and the USA, whilst also looking at new roles and opportunities. The South African Minister of Science and Technology, Mr Mosibudi Mangena, talked on the implications of the OECD declaration on Data Sharing for Publicly Funded Research Data for African and South African policy on research data and information management.

Curation of African digital content and practices in specific science domains was the focus of day two of the event. Proceedings concluded with discussion on a formalised network of African data and information curation centres.

I hope there will be conference proceedings or reports and perhaps some colleagues who attend will blog the event – if so I will add a future post to the blog.

New UK National Nuclear Archive to be established

Colleagues may have missed the announcement that The UK Nuclear Decommissioning Authority will invest £8 million in plans to create the UK’s National Nuclear Archive (NNA) in Caithness, Scotland. The money will be invested over three years and will help get the £20 million project off the ground.

For those interested in the digital preservation issues involved in the NNA, I would refer you to an informative presentation by Simon Tucker Information Manager at NDA. This was a presentation to the “Nuclear Information over the Millennia Workshop” held in November 2006.

The NNA will potentially hold between 20 and 30 million digital, paper and photographic records primarily concerning the history, development and decommissioning of the UK’s civil nuclear industry since the 1940s. Around 20 specialist jobs will be created by the project. The archive will take about four years to build and many more to establish as an exemplar in its field. Land near the airport, currently owned by the local authority, has been earmarked as a potential site.

The development will undoubtably be an important one and is a good reminder of the long-term value over centuries of some electronic records and digital preservation issues in key industries.

Archaeology Data Service Charging Policy

I’m currently looking closely at various efforts by different organisations to capture and model digital preservation costs as part of our work for JISC on developing a preservation cost model for research data.

As part of desk research for that work I have re-visited the Archaeology Data Service (ADS) Charging Policy now in its 4th edition (November 2007). I remember its first edition 10 years ago and being invited to comment on it when I was at the Arts and Humanities Data Service. It has continued to develop over the last 10 years but lost none of its accessibility and (professional) interest.

In short, it is a very user friendly, concise and informative document aimed at its depositors in the archaeological data community but its treatment of digital preservation costs and the thorny issue of charging are likely to make it of much wider interest hence this blog entry!

Digital Preservation costs are categorised and briefly explained  under four headings:

  • management and administration
  • Ingest
  • Dissemination
  • Storage and refreshment

The document identifies charges for standard deposits and levels of service and indicates potential variants and additional costs. There is an accompanying webpage on refreshment costs.

Its a fascinating (honest) and short read - highly recommended.

For those following the aftermath of the AHRC decision to stop funding the AHDS the following snippet from the charging policy may also be of interest:
“The ADS currently receives some core funding from the Arts and Humanities Research Council (AHRC). The AHRC have indicated that the ADS should investigate a move toward a responsive mode funding for archives created by AHRC funded projects in the long term. In the past the ADS has waived deposit charges for researchers based in UK Higher Education Institutions. Due to the change in our core funding arrangements, from 1st January 2008 ALL deposits, whether from projects created within or outwith UK Higher Education will be subject to some level of charge.”

Google to host research datasets

The Wired Blog gives advanced notice that the domain, http://research.google.com, will soon provide a home for terabytes of open-source scientific datasets. The storage will be free to scientists and access to the data will be free for all. The project, known as Palimpsest, missed its original launch date this week, but will debut soon. It is suggested that Palimsest will fill a major need for scientists who want to openly share their data, and will allow public access to an unprecedented amount of data. For example, two planned datasets are all 120 terabytes of Hubble Space Telescope data and the images from the 10th century manuscript the Archimedes Palimpsest.

Those with long memories (hopefully prevalent amongst digital preservationists!) will also remember the Google/ Nasa memorandum of understanding signed in September 2005 that “outlines plans for cooperation on a variety of areas, including large-scale data management, massively distributed computing, bio-info-nano convergence, and encouragement of the entrepreneurial space industry” so perhaps we should expect more major announcements along these lines from Google and NASA in months to come.

JISC Research Data Preservation Costs Study

I’m pleased to announce on the blog that Charles Beagrie Limited was awarded in December the contract to complete a study of research data preservation costs by JISC. Its an important and topical study as a joint NSF/JISC/Mellon Blue Ribbon Taskforce is about to start its two year assignment to look at sustainable digital preservation and access this month and there are moves to undertake a feasibility study during 2008 for a shared service for preservation of research data in UK universities.

The study has a demanding timescale (we have to report by the end of March) but it will be a pleasure to work with our associate Julia Chruszcz, Brian Lavoie at OCLC and colleagues at the universities of Cambridge, Southampton and King’s College London on this assignment. Work is now well underway.

Very briefly, the JISC is expecting the study should:

  1. Investigate the costs (direct and indirect) of preserving research data, from an institution’s point of view
  2. Construct a list of issues which universities will need to consider when determining the medium to long-term costs of data preservation
  3. Attempt to establish a methodology which will help institutions estimate the cost per unit of research data preserved
  4. Compare the costs of each different model of preservation (eg. shared services, institutional repository, discipline focused, centralised)
  5. Consider the direct and indirect costs of data preservation in the next 5-10 years and beyond.

I will post further information on the study and draft outcomes at the end of March 2008.

Next »