Digital Curation

Forthcoming presentation: the LIBER Webinar – Research Data – What To Keep?

Monday, March 2, 2020 2:00 PM – 3:00 PM (Amsterdam time)

What to keep in terms of research data has been a recognized issue for some time but research data management (and, in particular, appraisal and selection) has become a more significant focus in recent years. Researchers, librarians, data curators, and policy makers all need to answer the question, what research data should be kept? We can’t keep it all, because that would be too expensive and time-consuming, but we have to keep data that is irreplaceable and unique in its value for future research, some or to enable it to be reused and validated, to enable peer review to be informed, and to enable there to be trust in research findings. Types of data needing to be retained also vary and may include related materials such as software and documentation. How much and what is enough?

In this webinar, organized by LIBER’s Research Data Management Working Group, we’ll dive into the topic of what to keep with expert Neil Beagrie, author of a recent Jisc-funded report on this topic. Neil is director of consultancy at Charles Beagrie. He is an experienced senior consultant and an internationally recognized expert with extensive experience in research data management and digital preservation. He was awarded the 2014 Archival Technology Medal by the Society of Motion Picture and Television Engineers (SMPTE) for his long-term contributions to digital preservation.

Attendees will learn about:

  • Existing practice and guidance for appraisal and review for research data and related materials;
  • Generic/disciplinary/sub-disciplinary differences in defining research data e.g. practice-based research in the arts and humanities;
  • Research integrity and data sharing as strategic drivers for research data management;
  • Differences in levels of curation and considerations of value and cost.

The webinar will be hosted by Dr Birgit Schmidt and Rob Grim. Birgit is Head of Knowledge Commons at Göttingen State and University Library and Chair of LIBER’s Research Infrastructure Steering Committee. Rob is an Economics (Data) Librarian at Erasmus University Rotterdam and Co-chair of LIBER’s Research Data Management Working Group.

Registration via https://www.anymeeting.com/AccountManager/RegEv.aspx?PIID=EE52DD8980463C

Datanomics: the value of research data

Glasgow_NB_Keynote

Twenty years ago format obsolescence was seen as the greatest long-term threat to digital information.  Arguably, experience to date has shown that funding and organisational challenges are perhaps more significant threats. I hope this presentation helps those grappling with these challenges and shows some key advances in how to use knowledge of costs, benefits and value to support long-term sustainability of digital data and services.

These are the slides from my keynote presentation to the joint Digital Preservation Coalition / Jisc workshop on Digital Assets and Digital Liabilities – the Value of Data held in Glasgow in February 2018. The slides summarise work over the last decade in the key areas of exploring costs, benefits and value for data. The slides posted here have additional slide notes and references to new publications since the workshop and some modifications such as removal of animations. One day I hope to have time to synthesis this presentation in an accessible way as a more extensive article but hope this slide deck on Slide share at https://www.slideshare.net/Nbeagrie is a useful interim resource.

Datanomics

New “What To Keep” research data report published by Jisc

What to Keep

“What To Keep?” a new Jisc research data report by Charles Beagrie Ltd has just been published by Jisc. You can access the full report directly at: https://repository.jisc.ac.uk/7262/

What to keep in terms of research data has been a recognised issue for some time but research data management and in particular appraisal and selection (i.e. “what to keep and why”) has become a more significant focus in recent years as volumes and diversity of data have grown, and as the available infrastructure for ‘keeping’ has become more diverse.

The purpose of the What to Keep report is to provide new insights that will be useful to institutions, research funders, researchers, publishers, and Jisc on what research data to keep and why, the current position, and suggestions for improvement.

The analysis of emerging themes and mappings is available as a set of tables. Seven mini case studies illustrate in more detail the approaches and rationale for what to keep for different repositories, stakeholders and disciplinary areas.

The report provides insights on how what to keep decisions can be guided and supported, and the ten study recommendations and the potential implementations for them, provide practical suggestions for future development.

What to Keep Recommendations

European Open Science Cloud

EOSCpilot_web

Charles Beagrie Ltd have been providing additional expert resource in Open Science and Open Scholarship to Jisc, a partner in the EOSCpilot project funded by the EC’s Horizon 2020 Research & Innovation programme. The EOSC – European Open Science Cloud – aims to create a trusted environment for hosting and processing research data to support EU science.

We helped to support the finalisation of draft policy recommendations aimed at encouraging implementation and take-up of the EOSC. This involved supporting consultation on the draft policy recommendations, and helping to prioritise and develop them in more detail, to produce a coherent policy proposition.

We look forward to seeing the final public recommendations and future development of EOSC.

Research Data: What to Keep?

Charles Beagrie Ltd has started a new research data study for Jisc and UK institutions.

Jisc is working to develop shared infrastructure, influence policy and provide guidance to support institutions with the growing need for robust research data management. There is a wide-range of needs and existing provision for creation, collection, storage and preservation, and reuse of data within UK Higher Education.

What research data should be kept?

Researchers, data curators and policy makers all need to answer the question, what research data should be kept? We can’t keep it all, because that would be too expensive and time-consuming. However, we have to keep data that is irreplaceable and unique in its value for future research; to enable it to be reused and validated: to enable peer review to be informed; and to enable there to be trust in research findings. Types of data needing to be retained vary and may include related materials such as software and documentation. But how much and what is enough? Obviously, there is no single answer to that: it depends on many factors, but what are those factors, and how should we weight them? These remain difficult and open questions, but this year Jisc is working with us to take a step toward answering them.

How can we identify what to keep?

We are setting out to explore, what actually is the optimal data to keep from research projects conducted at UK institutions? Over the course of the rest of 2018, our project will work with a small number of research areas to find out. What conditions, such as openness or timescales, might be ideal? We will consult the views of researchers (as data creators and data users), research funders, ethics professionals, archivists, research data managers, peer reviewers, other research users, and others on these questions. We will dig into the reasons for their views, and into whether research data is currently kept in line with those views, or not.

Why are we carrying out this investigation now?

This work comes at a critical time in the evolution of research data management and sharing. At the policy level, the recommendations from the UK Open Research Data Taskforce are expected shortly. These may take into account both the recommendations to Government of the 2017 report by Dame Wendy Hall and Jérôme Pesenti into the future of the UK artificial intelligence industry and the recent Government announcements around this, where research data can be a key input into AI tools. The availability of research data is also a matter of concern to those interested in research integrity and reproducibility. Relevant infrastructure investments include both the Jisc research data shared service and the increasing activity around the European Open Science Cloud.

Both policy and infrastructure investments need better information about the extent and nature of the research data that needs to be kept, under what conditions, and for how long. Our 2018 project will not provide all this information, but it will explore current practices and take the next step.

The Costs of Inaction: advocating for digital preservation


Beagrie 1

Illustration by Jørgen Stamp digitalbevaring.dk CC BY 2.5 Denmark

Traditionally the major challenge in digital preservation has been seen to be technology obsolesence. However, arguably the organisational challenges, particularly funding (and advocacy for funding), have proved to be much more significant over time.

In recent years an increasing number of community efforts have focussed on helping organisations to identify benefits and write a business case for digital preservation. The Keeping Research Data Safe (KRDS) Digital Preservation Benefits Analysis Tools and the Digital Preservation Business Case Toolkit are good examples.

Most organisations require a business case for major funding projects. This will outline what resources are required, what the resource will be used to achieve, and how this new investment will benefit the organisation.

It is seen as good practice to have actively considered a number of alternative options to the preferred one that you have put forward. One of the options normally recommended for inclusion in a business case is that of doing nothing and its consequences (“the costs of inaction”).

This blog suggests how research on “the costs of inaction” might contribute successfully to advocacy and business cases for digital preservation.

Although it will be of value to all repositories, it will be particularly pertinent for new and emerging repositories. New and emerging services may face particular challenges because digital preservation is a long-term activity: collections are usually appreciating assets – returns can increase over time as  collections reach a critical mass and user awareness of them grows.

For these  repositories it is always helpful to consider “the cost of inaction” and the counter-factual position if no repository exists. There are already hidden opportunity costs and negative consequences involved in doing nothing and they can provide a benchmark against which the value of and funding case for new or emerging services can be assessed.

Counter-factual evidence is difficult to gather and this remains an understudied approach. There are a few great examples for other preservation domains (I am a big fan of AVPreserve’s Cost of Inaction Calculator for Audio Visual archives and its promotional video) and this video is worth watching to understand underlying principles even if you are not an AV repository.


Another relevant example is from the domain of data archives. A recently completed CESSDA-SaW Cost-Benefit Advocacy Toolkit produced on behalf of the Consortium of European Social Science Data Archives looks at how we might evaluate immature repositories or the case for creating new ones. The Toolkit emphasises the potential importance of thinking counter-factually. What would happen if there was no repository?

There are a small number of studies that have looked at quantifying some of the hidden and opportunity costs, particularly for data archived with individual researchers as opposed to being preserved in a long-term repository.


Beagrie 2

Illustration Charles Beagrie Ltd ©2017. CC-BY licensed

The CESSDA SaW Return on Investment Factsheet in the toolkit pulled together what evidence we have on the counter-factuals for data repositories. These studies are all partial and narrowly focussed. However, they variously consider what happens when research data is archived by individual researchers.

The reported findings are summarised in the table below in terms of total data loss, partial data loss, access (data requests fulfilled), and delay (the elapsed time until requests are fulfilled). The loss of data, loss of access, delays and inefficiencies are in many ways the flip-side of the high efficiencies seen for users of data archives. They are discussed in fuller detail in the Factsheet.


Beagrie 3

Illustration from Cessda-SaW Return on Investment Factsheet Charles Beagrie Ltd ©2017. CC-BY licensed

These reported metrics are from studies of different disciplines and study dates (and perhaps have differing levels of certainty – absolute loss is a very difficult metric to gather evidence for). However, they contrast sharply with the excellent preservation record, very high fulfilment rates, and rapid online access rates of public data archives in the social sciences. The public data archives also are appreciating as opposed to depreciating assets with improving rather than decreasing trends in value over time.

These studies and results provide an indicator of how valuable more data on the cost of inaction would be to the digital preservation community generally. They also provide pointers to the methodologies that could be used. Feedback from focus groups with key stakeholders of data archives undertaken during production of the toolkit (see the CESSDA SaW D4.9 Cost-benefit Advocacy Toolkit Deliverable Report) certainly suggests how effective data on the costs of inaction could be as a central part of our advocacy and in business cases for digital preservation. It is research I would like to see extended and if you are aware of other examples please let me know.

This blog post was first published on the DPC website as part of the first ever International Digital Preservation Day on 30th November 2017.

CESSDA SaW Final Conference in Dublin

The final conference of the CESSDA SaW project was held in Dublin, Ireland on 19th October 2017 and summarised the project results in strengthening and widening of European infrastructure of social science data archives. Organized by the Irish Social Science Data Archive (ISSDA) and CESSDA ERIC, the event was very successful hosting representatives from 28 countries. CESSDA members, non-members and aspiring members, were rounded to present the outcomes of a two-year project which has helped increasing the consortium and strengthening its members.

It has been an extremely productive and collaborative project with many valuable and interesting outputs. Charles Beagrie Ltd has led on the development of the cost-benefit advocacy toolkit (released in April 2017) in CESSDA-SaW and we covered this in a previous blog post – but there are many other project outputs now available that will be of interest to the research data management community.

There is a fuller report, presentations and photos from the conference available here.

The Cost-Benefit Advocacy Toolkit: useful tools for research data and digital preservation

We are pleased to announce that the Cost-Benefit Advocacy Toolkit has been published by Consortium of European Social Science Data Archives (CESSDA) and is available for you to use.

The Toolkit will be of interest to a wide audience in research data management and digital preservation.

It was developed within the CESSDA SaW project, which aims to strengthen and widen the CESSDA network.

You can access the Toolkit and download any components from here.
The Toolkit is comprised of:

  • A User Guide;
  • Three Factsheets (Benefits, Costs, and Return on Investment);
  • Four Case Studies from Social Science Data Archives (ADP in Slovenia, FSD in Finland, LiDA in Lithuania, and UKDS in the UK);
  • Two Worksheets (the Archive Development Canvas, and the Benefits Summary for a Data Archive);
  • A Deliverable Report describing how the toolkit was developed.

In addition, the Toolkit describes and links to a number of pre-existing external tools and relevant studies.

The major use for the Toolkit will be supporting funding and business cases but elements are likely to be relevant in advocacy to other groups or in supporting broader operational tasks.

Some feedback on the draft Toolkit from attendees at our International Digital Curation Conference 2017 workshop earlier this year included:

“This was one of the most relevant and important workshops I have ever attended in my 14 years of professional experience in this library profession. Since I am interacting with senior stakeholders (e.g. assistant vice-presidents, Deans, Chairs, & associate Deans etc.), cost-benefit and ROI are very important to the development of research data services.”

“The worksheets are really useful, and very relevant to be used at an institutional level.”

“Highly relevant and good content.”

The CESSDA SaW Project is funded by the EU Horizon 2020 Research and Innovation Programme under the agreement No.674939.

The development of the Toolkit was led by Charles Beagrie Ltd, with support from the Slovenian Social Science Data Archive (ADP), the Finnish Social Science Data Archive (FSD), the Lithuanian Social Science Data Archive (LiDA), the University of Tartu in Estonia (UTARTU), and the UK Data Service (UKDS).

You can find out more about CESSDA SaW here.

IDCC Conference Workshop Feb 2017

Demonstrating the Value and Impact of Research Data Services

Monday pm 20th February 2017

Workshop organisers: Neil Beagrie (Charles Beagrie Ltd) and Mike Priddy (DANS) and the Consortium of European Social Science Archives (CESSDA).

Description: At this half-day workshop attendees, will learn from Neil Beagrie and Mike Priddy about how to apply the Cost-Benefit Advocacy Toolkit, the Capability Development Model, and the Archive Development Canvas (a variant of the Business Model Canvas) developed by the CESSDA Strengthening and Widening Project (CESSDA-SaW). Although the CESSDA-SaW project work focuses on the social sciences, core elements are multi-disciplinary and relevant to a wide range of organisations at IDCC involved in development, funding, and advocacy for research data infrastructures and open access for data.

The workshop is free to attend but places are limited so early booking is advised.

CESSDA-SaW is a project funded by the Horizon 2020 programme. Its principal objective is to develop the maturity of data archive services that are aspiring to be, or are a part of the CESSDA community of social science data archives in a coherent and deliberate way towards the vision of a comprehensive, distributed and integrated social science data research infrastructure, facilitating access to social science data resources for researchers regardless of the location of either researcher or data. As part of the project, we have been developing the Cost-Benefit Advocacy Toolkit, the Capability Development Model, and the Archive Development Canvas to assist data archive services across Europe.

The broad outline for the workshop will be as follows:

  • Brief introduction to the CESSDA-SaW project
  • Presentation and discussion of the Cost-Benefit Advocacy Toolkit
  • Presentation and discussion of the Capability Development Model
  • Panel presentation and discussion – Bringing it together: The Archive Development Canvas
  • Breakout groups with hands-on opportunities to use and discuss the tools we have presented

The expected learning outcomes from the workshop are that all attendees will:

  • Understand the purpose of CESSDA-SaW, the Toolkit, Capability Development Model, and the Archive Development Canvas;
  • Understand what is specific to social science, to different funding regimes, or maturity of services;
  • Know the main findings from the desk research on the Toolkit and key lessons learnt;
  • Understand economic approaches such as Return on Investment, other key arguments for Value, how it has been calculated, and why the counter-factual and “cost of inaction” are important;
  • Understand how to use the Capability Development Model to undertake a self-assessment
  • Know what outputs will be available from CESSDA-SaW and how they might use them.

To register for the workshop see http://www.dcc.ac.uk/events/idcc17/workshops

If you are too late to book, I will maintain a short reserve list. Please contact me if you wish to be added to the list. Should anyone drop out and a place become available it will be offered to the reserves.

Presentation on the Value and Impact of Social Science Data Archives and the CESSDA SaW Toolkit

A set of 38 slides now on slideshare used for the Focus Group Cost-Benefit Funding Advocacy Program (Task 4.6) session at the CESSDA Saw Workshop in The Hague 16/17 June 2016.

This was an interactive focus group repeated over two parallel sessions.  It was aimed at European social science data archive staff with responsibility for bidding for funding or promotion and advocacy of the archive to key stakeholders.  The presentation covers some of the key ideas on how the CESSDA Saw funding advocacy toolkit will be structured, its components, and key facts and approaches it will include.

We expect the cost-benefit funding advocacy toolkit under development to support the negotiation with ministries and funding organisations across Europe.

The results of the toolkit user requirements survey with responses from 24 European social science archives were presented and discussed, together with suggested approaches and content for the toolkit. 22 people attended the two sessions overall, representing a mix of countries at different stages on the development path for social science archives (none, new/emerging, mature). There was strong interest and support for the emerging toolkit together with open discussion of how it can be applied in the specific political and administrative context of different European countries.

The slide set presented here is an extended version including a number of hidden background/ reference slides not used in the presentation. The focus group is one of a series guiding further development of the toolkit and its adoption being given to either: (a) social science data archive staff or (b) their key stakeholders (senior management in their universities, research councils and academies, funding ministries, national statistics offices, research users and depositors).

CESSDA is the Consortium of European Social Science Data Archives. The CESSDA SaW project “Strengthening and widening the European infrastructure for social science data archives” is funded by the European Commission as part of its Horizon2020 programme.

Next »