Charles Beagrie Ltd

Research Data: What to Keep?

Charles Beagrie Ltd has started a new research data study for Jisc and UK institutions.

Jisc is working to develop shared infrastructure, influence policy and provide guidance to support institutions with the growing need for robust research data management. There is a wide-range of needs and existing provision for creation, collection, storage and preservation, and reuse of data within UK Higher Education.

What research data should be kept?

Researchers, data curators and policy makers all need to answer the question, what research data should be kept? We can’t keep it all, because that would be too expensive and time-consuming. However, we have to keep data that is irreplaceable and unique in its value for future research; to enable it to be reused and validated: to enable peer review to be informed; and to enable there to be trust in research findings. Types of data needing to be retained vary and may include related materials such as software and documentation. But how much and what is enough? Obviously, there is no single answer to that: it depends on many factors, but what are those factors, and how should we weight them? These remain difficult and open questions, but this year Jisc is working with us to take a step toward answering them.

How can we identify what to keep?

We are setting out to explore, what actually is the optimal data to keep from research projects conducted at UK institutions? Over the course of the rest of 2018, our project will work with a small number of research areas to find out. What conditions, such as openness or timescales, might be ideal? We will consult the views of researchers (as data creators and data users), research funders, ethics professionals, archivists, research data managers, peer reviewers, other research users, and others on these questions. We will dig into the reasons for their views, and into whether research data is currently kept in line with those views, or not.

Why are we carrying out this investigation now?

This work comes at a critical time in the evolution of research data management and sharing. At the policy level, the recommendations from the UK Open Research Data Taskforce are expected shortly. These may take into account both the recommendations to Government of the 2017 report by Dame Wendy Hall and Jérôme Pesenti into the future of the UK artificial intelligence industry and the recent Government announcements around this, where research data can be a key input into AI tools. The availability of research data is also a matter of concern to those interested in research integrity and reproducibility. Relevant infrastructure investments include both the Jisc research data shared service and the increasing activity around the European Open Science Cloud.

Both policy and infrastructure investments need better information about the extent and nature of the research data that needs to be kept, under what conditions, and for how long. Our 2018 project will not provide all this information, but it will explore current practices and take the next step.

The Costs of Inaction: advocating for digital preservation


Beagrie 1

Illustration by Jørgen Stamp digitalbevaring.dk CC BY 2.5 Denmark

Traditionally the major challenge in digital preservation has been seen to be technology obsolesence. However, arguably the organisational challenges, particularly funding (and advocacy for funding), have proved to be much more significant over time.

In recent years an increasing number of community efforts have focussed on helping organisations to identify benefits and write a business case for digital preservation. The Keeping Research Data Safe (KRDS) Digital Preservation Benefits Analysis Tools and the Digital Preservation Business Case Toolkit are good examples.

Most organisations require a business case for major funding projects. This will outline what resources are required, what the resource will be used to achieve, and how this new investment will benefit the organisation.

It is seen as good practice to have actively considered a number of alternative options to the preferred one that you have put forward. One of the options normally recommended for inclusion in a business case is that of doing nothing and its consequences (“the costs of inaction”).

This blog suggests how research on “the costs of inaction” might contribute successfully to advocacy and business cases for digital preservation.

Although it will be of value to all repositories, it will be particularly pertinent for new and emerging repositories. New and emerging services may face particular challenges because digital preservation is a long-term activity: collections are usually appreciating assets – returns can increase over time as  collections reach a critical mass and user awareness of them grows.

For these  repositories it is always helpful to consider “the cost of inaction” and the counter-factual position if no repository exists. There are already hidden opportunity costs and negative consequences involved in doing nothing and they can provide a benchmark against which the value of and funding case for new or emerging services can be assessed.

Counter-factual evidence is difficult to gather and this remains an understudied approach. There are a few great examples for other preservation domains (I am a big fan of AVPreserve’s Cost of Inaction Calculator for Audio Visual archives and its promotional video) and this video is worth watching to understand underlying principles even if you are not an AV repository.


Another relevant example is from the domain of data archives. A recently completed CESSDA-SaW Cost-Benefit Advocacy Toolkit produced on behalf of the Consortium of European Social Science Data Archives looks at how we might evaluate immature repositories or the case for creating new ones. The Toolkit emphasises the potential importance of thinking counter-factually. What would happen if there was no repository?

There are a small number of studies that have looked at quantifying some of the hidden and opportunity costs, particularly for data archived with individual researchers as opposed to being preserved in a long-term repository.


Beagrie 2

Illustration Charles Beagrie Ltd ©2017. CC-BY licensed

The CESSDA SaW Return on Investment Factsheet in the toolkit pulled together what evidence we have on the counter-factuals for data repositories. These studies are all partial and narrowly focussed. However, they variously consider what happens when research data is archived by individual researchers.

The reported findings are summarised in the table below in terms of total data loss, partial data loss, access (data requests fulfilled), and delay (the elapsed time until requests are fulfilled). The loss of data, loss of access, delays and inefficiencies are in many ways the flip-side of the high efficiencies seen for users of data archives. They are discussed in fuller detail in the Factsheet.


Beagrie 3

Illustration from Cessda-SaW Return on Investment Factsheet Charles Beagrie Ltd ©2017. CC-BY licensed

These reported metrics are from studies of different disciplines and study dates (and perhaps have differing levels of certainty – absolute loss is a very difficult metric to gather evidence for). However, they contrast sharply with the excellent preservation record, very high fulfilment rates, and rapid online access rates of public data archives in the social sciences. The public data archives also are appreciating as opposed to depreciating assets with improving rather than decreasing trends in value over time.

These studies and results provide an indicator of how valuable more data on the cost of inaction would be to the digital preservation community generally. They also provide pointers to the methodologies that could be used. Feedback from focus groups with key stakeholders of data archives undertaken during production of the toolkit (see the CESSDA SaW D4.9 Cost-benefit Advocacy Toolkit Deliverable Report) certainly suggests how effective data on the costs of inaction could be as a central part of our advocacy and in business cases for digital preservation. It is research I would like to see extended and if you are aware of other examples please let me know.

This blog post was first published on the DPC website as part of the first ever International Digital Preservation Day on 30th November 2017.

Public Release of New PDF/A Technology Watch Report

The Digital Preservation Coalition (DPC) and Charles Beagrie Ltd have released Preservation with PDF/A by Betsy Fanning, the latest in their series of Technology Watch Reports to the public. This is now the 14th Technology Watch Report produced over the last 5 years by Charles Beagrie Ltd and the DPC. It provides a comprehensive review of the PDF/A standard and its use.

An update to the original Technology Watch Report, Preserving the Data Explosion: Using PDF published in 2008, the report begins with a history of the PDF/A standard and its development, before moving on to an examination of conformance levels, validation methods and considerations to be made when choosing to use PDF/A for long-term preservation.

“Conformance to the standard is not a simple ‘yes/no’ binary state, in part because there are now four variants of PDF/A,” explains author Betsy Fanning. “One question that is often asked is: ’When should I use PDF/A, and which version should I use?’ This report attempts to answer that question and to provide some guidance about the strengths, weaknesses, opportunities and threats associated with each.”

Preservation with PDF/A examines each of the four variants and lays out the conditions under which it might be beneficial to use PDF/A-3 rather than PDF/A-1, and vice versa, before presenting a range of practical considerations to make the most effective use of the file format.

Neil Beagrie, managing editor of the Technology Watch Report series on behalf of the DPC, added “the choice of file format is a component of a wider technical and organizational infrastructure which comprises a comprehensive digital preservation solution. This report will make interesting reading for anyone putting together their digital preservation strategy.”

Note the new style cover design!

Read ‘Preservation with PDF/A’ now

15th anniversary for Charles Beagrie Ltd

Today is the 15th anniversary of the founding of Charles Beagrie Ltd by myself and Daphne. Our thanks to our associates, partners, and clients for making the last 15 years enjoyable and productive ones stretching across many different disciplines and sectors!

Digital Preservation Handbook wins IRMS Innovation of the Year Award

 

The Information and Records Management Society (IRMS) has recognised the re-imagined and revised 2nd edition of the Digital Preservation Handbook as its Innovation of the Year.

Speaking after the Awards ceremony, IRMS Chair Scott Sammons praised the Handbook, saying “This fantastic resource has had such positive feedback from our members. It takes the traditional idea of an information handbook and repackages it to offer essentially useful information in a way that is simple, easy to understand and easy to act upon. It ticks all the boxes.”

The 2nd edition of Digital Preservation Handbook provides an authoritative and practical guide to the complex topic of digital preservation. The Digital Preservation Coalition has hosted and maintained the Digital Preservation Handbook since 2002. Supported by a group of external funders, the new edition of the handbook was developed by an expert community of international authors, under the editorship of Neil Beagrie of Charles Beagrie Ltd, in a series of innovative ‘booksprints,’ ensuring it spoke to as wide an audience as possible whilst retaining a deep understanding of the topics covered.

Neil noted “The online DP Handbook first went live in May 2002. This award is a wonderful way to recognise the ambition and vision of the DPC in instigating this revision, the innovation and effort involved in the Handbook’s re-design and re-launch last year, and the Handbook’s longstanding contribution to the profession and digital preservation practice. Thanks to all who made the second edition so successful: William and staff at the DPC, the funding sponsors, contributors (content, booksprints, peer review, and advisory board), Daphne at Charles Beagrie Ltd for design, layout and proof-reading, and Digital Bewaring for wonderful images.”

Not so much a handbook now, a new responsive website provides free-of-charge open access to case studies, videos and peer-reviewed online content which captures the state of the art in managing data for the long-term. It includes interactive functions, allowing readers to add comments and suggest examples and updates, while a completely new section called ‘Getting Started in Digital Preservation’ supports the DPC’s programme of introductory workshops.

Member of the editorial board for the DPC, Sharon McMeekin says “this is the award the matters most to us. It is a resource created by the digital preservation community for the digital preservation community. We couldn’t be more thrilled that it has been recognised as the great resource it is by the IRMS and its members.”

The 2nd edition of the Handbook was developed and delivered by a research consortium of the Digital Preservation Coalition (DPC) and Charles Beagrie Ltd. The DPC helps members to deliver resilient long-term access to digital content and services, helping them to derive enduring value from digital collections. The Coalition also raises awareness of the attendant strategic, cultural and technological challenges and supports members through advocacy, workforce development, capacity-building and partnership.

The Cost-Benefit Advocacy Toolkit: useful tools for research data and digital preservation

We are pleased to announce that the Cost-Benefit Advocacy Toolkit has been published by Consortium of European Social Science Data Archives (CESSDA) and is available for you to use.

The Toolkit will be of interest to a wide audience in research data management and digital preservation.

It was developed within the CESSDA SaW project, which aims to strengthen and widen the CESSDA network.

You can access the Toolkit and download any components from here.
The Toolkit is comprised of:

  • A User Guide;
  • Three Factsheets (Benefits, Costs, and Return on Investment);
  • Four Case Studies from Social Science Data Archives (ADP in Slovenia, FSD in Finland, LiDA in Lithuania, and UKDS in the UK);
  • Two Worksheets (the Archive Development Canvas, and the Benefits Summary for a Data Archive);
  • A Deliverable Report describing how the toolkit was developed.

In addition, the Toolkit describes and links to a number of pre-existing external tools and relevant studies.

The major use for the Toolkit will be supporting funding and business cases but elements are likely to be relevant in advocacy to other groups or in supporting broader operational tasks.

Some feedback on the draft Toolkit from attendees at our International Digital Curation Conference 2017 workshop earlier this year included:

“This was one of the most relevant and important workshops I have ever attended in my 14 years of professional experience in this library profession. Since I am interacting with senior stakeholders (e.g. assistant vice-presidents, Deans, Chairs, & associate Deans etc.), cost-benefit and ROI are very important to the development of research data services.”

“The worksheets are really useful, and very relevant to be used at an institutional level.”

“Highly relevant and good content.”

The CESSDA SaW Project is funded by the EU Horizon 2020 Research and Innovation Programme under the agreement No.674939.

The development of the Toolkit was led by Charles Beagrie Ltd, with support from the Slovenian Social Science Data Archive (ADP), the Finnish Social Science Data Archive (FSD), the Lithuanian Social Science Data Archive (LiDA), the University of Tartu in Estonia (UTARTU), and the UK Data Service (UKDS).

You can find out more about CESSDA SaW here.

Reflections on the 2nd Digital Preservation Handbook Book Sprint 18-19 May 2015

Another rewarding but exhausting couple of days! We completed a two day book sprint in Kew earlier this week focussing on developing more new content for the release of the next edition of the Digital Preservation Handbook that is being funded by The National Archives, the British Library, and Jisc. Really pleased with the outputs and progress we made.

This is now the second book sprint we have held and we have been able to build on the sterling work at the first sprint held in October last year.

A group of 9 people Neil Beagrie (Charles Beagrie Ltd), Glenn Cumisky (British Museum), Matt Faber (Jisc), Stephen Grace (University of East London), Alex Green (The National Archives), William Kilbride (DPC), Gareth Knight (London School of Hygiene & Tropical Medicine), Sharon McMeekin (DPC), and Paul Wheatley (DPC), met up over two days to progress sections of the content for the new “ Getting Started” and “Organisational Activities” sections of the Handbook (as identified in the Draft Outline of the 2nd Edition of the Digital Preservation Handbook). We also progressed some sub-sections of “Technical Solutions and Tools” left over from Book Sprint 1. The venue for the sprint was kindly provided by The National Archives in their Kew building.

We completed draft sections for:

Getting Started

Creating digital materials

Acquisition and appraisal

Retention and review

Preservation

Metadata and documentation

Access

Information Security

Persistent Identifiers

We covered more topics than the first sprint so were occasionally thinly spread: as a cautionary note we may need to review our draft content carefully to ensure the final outputs have the breadth and depth of perspective we aim for:  what I have read so far has been terrific although inevitably it will need some more content adding and final polishing.

The revision has been guided by the user feedback and consultation (see Report on the Preparatory User Consultation on the 2nd Edition of the Digital Preservation Handbook) in short to keep the Handbook text practical, concise, and accessible with more detail available in the case studies and further reading.

We used a different tool from book sprint 1 and successfully adopted Google Docs for our collaborative writing.

A two-day book sprint was very intense but few could have spared more time away from the workplace, and a tight-deadline helped everyone focus on the tasks in hand.

We followed a process of scoping contents for a specific section, brainstorming key points for inclusion, writing, and then review.

Participants were also able to see the substantial emerging Handbook content that is already in the DPC content management system together with the excellent illustrations re-used with permission from digitalbevaring.dk. In addition Google Docs was pre-populated with any relevant text from the previous Handbook, marked in red so it was easily identifiable for review, retention, deletion, amendment or addition/replacement  as needed. The Google Docs were also pre-populated with all case studies and external resources relevant to those sections identified during desk research for the new edition of the Handbook.

The after work drinks in the Tap on the Line and group dinner at Café Mamma were enjoyed by all and allowed everyone to relax and socialise outside the event itself. Next time I will try to remember to take photos for the report!

In June the draft text will be the focus for detailed editorial review, additions, arrangement, proof-reading and input to the DPC content management system. Based on the 1st book sprint that will be at least a two month process after which we will look for peer review to be completed by around the end of September.

It is great to see so much more of the new Handbook there in preliminary form after the sprint. With the contents of the first sprint, supplementary work, and its peer review, there is now substantial draft content emerging for the 2nd edition of the Handbook.

Invitation to comment: New Edition of Digital Preservation Handbook

We are scoping and planning for a new edition of the online Digital Preservation Handbook and would be very grateful if you could contribute your needs and views to this work.

The Digital Preservation Handbook, written by Neil Beagrie and Maggie Jones, is hosted by the Digital Preservation Coalition (DPC), which makes the Handbook freely available as an online resource. The Handbook provides an internationally authoritative and practical online guide that is heavily used for continuous professional development, for university students, and for training in digital preservation.

The National Archives is working together with other stakeholders including Jisc and the British Library, to support the Digital Preservation Coalition in updating and revamping the Handbook. It is anticipated that its revision will be modular and undertaken over a two year period. We request your input via a short online survey.

There are a maximum of 13 questions in total and the survey should take around 10 minutes of your time to complete.

The online questionnaire is accessible at: https://www.surveymonkey.com/s/DPHandbook and the survey will close on Wednesday 16th June.

Thank you in advance for your participation. Your input will make a significant contribution to the scoping of this important online resource and the scheduling of modules for publication.

William Kilbride (Executive Director, Digital Preservation Coalition)

Neil Beagrie (lead author and editor)

Work starting on a New Edition of the Digital Preservation Handbook

We are delighted to announce that The National Archives is working with the Digital Preservation Coalition (DPC), Charles Beagrie Ltd, Jisc and the British Library to update and revamp a key online resource for managing digital resources over time, the online edition of the Digital Preservation Handbook.

The Handbook authored by Neil Beagrie and Maggie Jones, was first published in 2001 in a print edition by the British Library with support from Resource: The Council for Museums, Archives and Libraries (whose functions have subsequently transferred to The National Archives and the Arts Council)and Jisc. The online edition was launched in 2002 on the Digital Preservation Coalition website. It remains heavily used by archivists and other information professionals.

The National Archives and the Digital Preservation Coalition and ourselves will work with expert partners over the next two years to develop the new look Handbook as an interactive online resource.

‘I’m delighted to be working with The National Archives on this important project’, said William Kilbride of the DPC.  The original handbook remains very popular so we have been loathed to take it down, but we’ve been aware for a while that it was becoming increasingly out of date.  Our experience shows that there is a real demand for concise and practical advice on preservation so I am confident that this new edition will be immediately popular’.

The project to deliver the resource is a joint venture between The National Archives, the DPC and Neil Beagrie (Charles Beagrie Ltd), one of the original authors of the report, with further contributions from Jisc which was one of the initial co-funders and the British Library who published the original handbook.

‘I’m looking forward to starting this important revision’, said Neil Beagrie.  ‘It’s not just a few updates to the text: we will be basing the new handbook on an extensive process of consultation to make sure that the new edition measures up to people’s real and emerging need and, to make sure that it highlights good practice.  We aim to make sure it binds together other sources of advice (including the many excellent reports in the DPC Technology Watch series) and that it provides authoritative and concise advice for topics that are not supported by other resources.’

The online element will ensure the Handbook can be easily updated over time, incorporating case studies and a view from current practitioners to ensure it is relevant to a wide audience, from beginners to those with more specialist needs. We hope the Handbook will help individuals from a wide range of organisations adopt a step-by-step approach to addressing their digital resource management needs.

Coming soon: May publication and webinar dates for TNA Cloud Storage and Digital Preservation Guidance

 

We are pleased to announce that our recent work on the TNA Cloud Storage and Digital Preservation Guidance and five accompanying case studies will be published and released on the TNA website next week.

To accompany the release of the Guidance, TNA will be hosting a free webinar with the authors (Neil Beagrie, Andrew Charlesworth, and Paul Miller) and Emma Markiewicz from TNA between 12.30-13.30pm on Tuesday 13th May.

The webinar will have a short presentation on the Guidance and will also provide an opportunity for you to put any questions or burning issues you may have to us and TNA.

Registration for the webinar is now open at

https://attendee.gotowebinar.com/register/6768962274937737985

To avoid disappointment, please register well in advance as numbers will be limited. After registering, you will receive a confirmation email containing information about joining the webinar.

You are welcome to submit questions in advance for the webinar via the comments field below or via email to neil@beagrie.com

Next »