Science and Industry

Government Data

The new UK coalition government has been making some interesting policy decisions around government data extending some of the work already underway under the previous Labour administration. For example see the prime minister’s Letter to Government departments on opening up data issued on Monday 31 May 2010.

The conservative party (majority partner in the coalition) technology manifesto is well worth looking over for anyone interested in data and IT policy in the UK and an indicator of what might still be coming out of the new government.

In addition, to plans to open up government data and spending information it refers to research by Rufus Pollock et al at Cambridge University on the economic value of open data, which estimated it will create an estimated £6 billion in additional value for the UK. This boost to British jobs will come from the synergies and positive spillover benefits that result from businesses and social entrepreneurs building new applications and services using previously locked-up government data.

It is fascinating to see how big an effect on UK government policy advocacy by the likes of the Open Knowledge Foundation and the Free Our Data campaign has had. Of course it helps if similar initiatives are underway in the USA – see the Wired interview with the US government’s first-ever chief information officer, Vivek Kundra.

Knowledge Management Marketplace, University of Bath 17th June 2010

The University of Bath and the UK Council for Electronic Business (UKCeB) are hosting the second Knowledge Management Marketplace (KMM10), taking place at the University of Bath on 17th June 2010. It focuses on knowledge management lessons learned for SMEs. There will also be a number of larger companies there such as Airbus, BAE, BMT, Korteq, IBM, etc.

KMM10 will be of interest to:

  • Those who face issues related to knowledge management in their working day;
  • Vendors, consultants and developers who can assist in addressing such issues;
  • Researchers with interests in this area.

The marketplace is preceded by scene-setting keynotes, and followed by a panel session where issued raised throughout the day may be debated in a group setting.

Economic Impact of Research Data Sharing

Zoe Locke, Lead Technologist at the UK Technology Strategy Board has made an interesting post Impact of Data to their blog requesting any information on the economic impact of research data sharing. Extract as follows:

“I am currently in Manchester attending a JISC workshop on Managing Research Data…

Yesterday, there was an interesting keynote speech from the Director of the Digital Curation Centre (DCC).  However, I noted that ‘Impact’ was the 3rd reason for why researchers should care about data curation.  I asked about the meaning of impact.  In the context of the talk, impact was about whether or not the research for which the data was used got published (and had an effect on the researcher’s career).  The DCC focuses on transferring knowledge on curation into and around the higher education sector so this seems like an appropriate definition of impact.  However, given the potential socio-economic impact of research and resultant data, not to mention the business opportunities it could create (though we don’t really know where or what these are, let alone how big they might be), I can’t help feeling that we need to widen the definition to stimulate greater sharing and exploitation of data.  If businesses could generate wealth or increase the quality of life with this data then surely it would be easier for anyone to justify footing the bill for curation…

Does anyone out there have any specific case studies of money being made or saved through the exploitation of research data (specifically that data generated in a different organisation to the one exploiting it)?”

You will need to register with the Connect Network to post a reply to Zoe direct but I am happy to forward any examples readers may add as comments to this posting on the Charles Beagrie blog.

Keeping Research Data Safe 2: Final Report Published

I am pleased to announce that the final report for Keeping Research Data Safe 2 (KRDS2) is now available from the JISC website. This KRDS2 study report presents the results of a survey of available cost information, validation and further development of the KRDS activity cost model, and a new taxonomy to help assess benefits alongside costs.

KRDS2 has delivered the following:

• A survey of cost information for digital preservation, collating and making available 13 survey responses for different cost datasets;

• The KRDS activity model has been reviewed and its presentation and usability enhanced;

• Cost information for four organisations (the Archaeology Data Service; National Digital Archive of Datasets; UK Data Archive; and University of Oxford) has been analysed in depth and presented in case studies;

• A benefits framework has been produced and illustrated with two benefit case studies from the National Crystallography Service at Southampton University and the UK Data Archive at the University of Essex.

One of the key findings on the long-term costs of digital preservation for research data was that the cost of archiving activities (archival storage and preservation planning and actions) is consistently a very small proportion of the overall costs and significantly lower than the costs of acquisition/ingest or access activities for all the case studies in KRDS2. As an example the respective activity staff costs for the Archaeology Data Service are Access (c.31%), Outreach/Acquisition/Ingest (c.55%), Archiving (c.15%).This confirms and supports a preliminary finding in KRDS1.

A range of supplementary materials in support of this report have also been made available on the KRDS project website. This includes the ULCC Excel Cost Spreadsheet for the NDAD service together with a Guide to Interpreting and Using the NDAD Cost Spreadsheet. The NDAD Cost Spreadsheet has previously been used as an exercise in digital preservation training events and may be particularly useful in training covering digital preservation costs. The accompanying Guide provides guidance to those wishing to understand and experiment with the spreadsheet.

US National Science Foundation to mandate research data management plans

During the May  meeting of the National Science Board, National Science Foundation (NSF) officials announced a change in the implementation of the existing policy on sharing research data. In particular, on or around October, 2010, NSF is planning to require that all proposals include a data management plan in the form of a two-page supplementary document. The research community will be informed of the specifics of the anticipated changes and the agency’s expectations.

The changes are designed to address trends and needs in the modern era of data-driven science. Ed Seidel, acting assistant director for NSF’s Mathematical and Physical Sciences directorate acknowledged that each discipline has its own culture about data-sharing, and said that NSF wants to avoid a one-size-fits-all approach to the issue. But for all disciplines, the data management plans will be subject to peer review, and the new approach will allow flexibility at the directorate and division levels to tailor implementation as appropriate.

Full details can be found in the NSF press release.

Data Management Plans are also required by a growing number of research funders in the UK. The Digital Curation Centre provides a useful overview of current UK funder requirements for data management and sharing plans and a Data Management Plan Content Checklist.

Ensuring Perpetual Access – German National Hosting Strategy for electronic resources – Study now available

I am pleased to announce that our study Ensuring Perpetual Access: establishing a federated strategy on perpetual access and hosting of electronic resources for Germany is now available.

Concepts and Properties of Archives and Hosting in the Strategy and their Relationships ©Charles Beagrie Ltd 2009

Concepts and Properties of Archives and Hosting in the Strategy and their Relationships ©Charles Beagrie Ltd 2009. CreativeCommons Attribution-Share Alike3.0 Key: solid colour represents core properties and fading colour represents weaker properties of archives and hosting services.

The study was commissioned by the Alliance of German Science Organisations to help develop a strategy to address the challenges of perpetual access and hosting of electronic resources. In undertaking the study we were requested to focus on commercial e-journals and retro-digitised material.

Although developed for Germany, there is substantial discussion and recommendations around the issues of perpetual access, archiving, and sustainability of hosting and access services for these materials which will be of interest to an international audience.

Contents include:

  • Discussion, definition, and glossary of terms;
  • Review of relevant  international activity;
  • Review of current and future desired position in Germany;
  • Gap analysis;
  • A series of use cases;
  • Scenarios, potential solutions, and recommendations

Model used for discussion of the Federated Strategy on Perpetual Access and Hosting of Electronic Resources for Germany  ©Charles Beagrie Ltd 2009. CreativeCommons Attribution-Share Alike3.0

The members of the Alliance of German Science Organisations are the Alexander von Humboldt Foundation, the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation), the Fraunhofer-Gesellschaft, the German Academic Exchange Service (DAAD), the German Rectors’ Conference (Hochschulrektorenkonferenz – HRK), the Helmholtz Association, the Leibniz Association, the Max Planck Society, and the Wissenschaftsrat (German Council of Science and Humanities). For further information on the Alliance Hosting Working Group that steered the study see:

English webpage:

http://www.allianzinitiative.de/en/core_activities/national_hosting_strategy/working_group/

Deutsch:

http://www.allianzinitiative.de/de/handlungsfelder/nationale_hosting_strategie/arbeitsgruppe/

Elsevier and PANGAEA Data Archive Linking Agreement

An interesting press release from last week particularly when seen in the context of previous announcements on this blog: an emerging trend of journals and publishers linking to open-access data repositories?
Extract: Amsterdam, 24 February 2010 – Elsevier, a world-leading publisher of scientific, technical and medical information products and services, announced today that the data library   PANGAEA – Publishing Network for Geoscientific & Environmental Data – and Elsevier have implemented reciprocal linking between their respective content in earth system research. Research data sets deposited at PANGAEA are now automatically linked to the corresponding articles in Elsevier journals on its electronic platform ScienceDirect and vice versa. This linking functionality also provides a credit mechanism for research data sets deposited in this data library.
Dr. Hannes Grobe, data librarian of PANGAEA at the Alfred Wegener Institute for Polar and Marine Research commented, “Through this fruitful cooperation, science is better supported and the flow of data into trusted archives is promoted. The interaction of a publisher with an Open Access data repository is ideal to serve the requirements of modern research by diminishing the loss of research data. It also enables the reader of a publication to verify the scientific findings and to use the data in his own work. The Elsevier-PANGAEA cooperation consequently follows the most recent recommendations of funding bodies and international organizations, such as the OECD, about access to research data from public funding.”
“Our goal is to continuously improve user experiences, and this is one of the ways we make this happen” added Dr. Christiane Barranguet, executive publisher at Elsevier. “This is the beginning of a new way of managing, preserving and sharing data from earth system research. It also highlights the value ScienceDirect can deliver on its platform by giving researchers the papers they need and helping them put those papers in context, delivering unique value to user.”
Working with the scientific community to preserve scientific research data is also an objective of the Elsevier Content Innovation programme. Through this agreement and development Elsevier supports long-term storage, wide availability and preservation of large research data sets.

Final report of the Blue Ribbon Task Force on Sustainable Digital Preservation and Access

The Final Report of the Blue Ribbon Task Force on Sustainable Digital Preservation and Access is now available.  The report Sustainable Economics for a Digital Planet: Ensuring Long-Term Access to Digital Information describes its work as follows:

“…questions remain about what digital information we should preserve, who is responsible for preserving, and who will pay.

The Blue Ribbon Task Force on Sustainable Digital Preservation and Access investigated these questions from an economic perspective. In this report, we identify problems intrinsic to all preserved digital materials, and propose actions that stakeholders can take to meet these challenges to sustainability. We developed action agendas that are targeted to major stakeholder groups and to domain-specific preservation strategies.

The Task Force focused its inquiry on materials that are of long-term public interest, looking at four content domains with diverse preservation profiles:

  • Scholarly discourse: the published output of scholarly inquiry
  • Research data: the primary inputs into research, as well as the first-order results of that research
  • Commercially owned cultural content: culturally significant digital content that is owned by a private entity and is under copyright protection; and
  • Collectively produced Web content: Web content that is created interactively, the result of collaboration and contributions by consumers.”

I have not had chance to look at the report in detail but hope to add a short commentary to the blog in due course.

Results of Digital Preservation Costs Survey now available

I am pleased to announce that the findings from the Keeping Research Data Safe 2 (“KRDS2) survey of digital preservation cost information are now available on the KRDS2 project webpage.

One of the core aims of the KRDS2 project was to identify potential sources of cost information for preservation of digital research data and to conduct a survey of them. Between September and November 2009 we made an open invitation via email lists and the project blog and project webpage for others to contact us and contribute to the data survey if they had research datasets and associated cost information that they believe may be of interest to the study.

13 survey responses were received: 11 of these were from UK-based collections, and 2 were from mainland Europe. Two further potential contributions from the USA were unfortunately not available in time to be included.

The responses covered a broad area of research including the arts and humanities, social sciences, and physical and biological sciences and research data archives or cultural heritage collections. Each survey response is approximately 6-8 pages in length.

A summary analysis plus individual completed responses to the data survey that provide  more detail, are available.

We have also made the revised versions of the KRDS2 activity model available to download.

We aim to release the KRDS2 report via JISC in March following peer review and final editing. Further supplementary materials from KRDS2 will also be placed on the project webpage in March.

You will also notice that we have recently undertaken a major website re-design and made additions, should you wish to browse other information on the web site.

New Charles Beagrie Projects for 2009/2010

We are starting up and partnering in a number of new and interesting consultancy projects which run into 2010 as follows:

Dryad is an emerging digital repository for supplementary data underlying published works in ecology, evolution, and related fields being developed by a consortium of the National Evolutionary Synthesis Center (NESCent) in the US and relevant scientific societies and academic journals. Its goals are to:

  • – preserve all the underlying data reported in a paper at the time of publication, when there is the greatest incentive and ability for authors to share their data. This is particularly important in the case of data for which a specialized repository does not exist.
  • – lower the burden of data sharing by providing one-stop data-deposition via handshaking with specialized repositories.
  • – assign globally unique identifiers to datasets, thus enabling data citations.
  • - allow end-users to perform sophisticated searches over data (not only by publication, but also by taxon, geography, geological age, biological concept, etc).
  • - allow journals and societies to pool their resources for one shared repository.
  • - enable bidirectional search and retrieval with data repositories from related disciplines.

The strategic priorities for Dryad emerged from a May 2007 workshop on “Data Preservation, Sharing, and Discovery: Challenges for Small Science in the Digital Era“, at which a variety of stakeholder journals and societies were represented.

I am pleased to announce that Charles Beagrie Limited will be working with the Dryad project team to develop a business plan and sustainability for the Dryad repository. Neil Beagrie and Julia Chruszcz will lead the consultancy with research support from Peter Williams. Further information on Dryad, the partners and the latest developments can be found on the Dryad website.

I2S2 - The  Infrastructure for Integration in Structural Sciences (I2S2) Project  is funded under the Research Data Management Infrastructure strand of the JISC’s Managing Research Data Programme, with a duration of 18 months (Oct 2009 to March 2011). It will identify requirements for a data-driven research infrastructure in “Structural Science”, focussing on the domain of Chemistry, but with a view towards inter-disciplinary application.

Two research data management pilots  will examine the business processes of research, and highlight the benefits of an integrated approach. Both pilots will address traversing administrative boundaries between institutions to national facilities in addition to issues of scale (local laboratory to national facilities, DIAMOND synchrotron and ISIS respectively).

A key component of the infrastructure will be a harmonised Integrated Information Model to include all stages of the Data Life Cycle. A “before and after” cost-benefit analysis will be performed using the Keeping Research Data Safe (KRDS2) model, which will be extended to address specific requirements in I2S2. We are looking forward to working with UKOLN (University of Bath and DCC), The Universities of Southampton and Cambridge, and the Science and Technology Facilities Council (STFC) in the project.

Next »