Digital Curation

Data Storage: Top Five Trends for 2012 from IBM (Data Preservation and Data Curation are up there!)

A very interesting presentation on Data Storage: IBM and Storage: Top Five Trends for 2012 from Steve Wojtowecz, vice president of storage software development at IBM on eWeek. Wojtowecz outlined five storage trends that will emerge in 2012: Data Preservation, Data Curation, Storage Analytics, Mass storage in Entertainment and Healthcare industries and Data Records Management (“Data Hoarders”). All major topics of interest to this blog with data preservation, data curation and even digital lives getting a mention. The article suggests “As storage becomes a key business driver in 2012, IBM officials said the industry will see new breakthroughs in storage research and business models coming from sectors such as entertainment and health care”. Worth a look.

New Projects for 2011-2013

It is a busy time of year with very little time to update the blog but a short update on current and future projects for 2011-2013 may of interest:

Economic Evaluation of Research Data Infrastructure – a study for the Economic and Social Research Council in the UK. This is being conducted jointly by Charles Beagrie Ltd with Prof John Houghton of the Centre for Strategic Economic Studies at Victoria University and is looking at the economic impact of the Economic and Social Data Service in the UK. Such studies on the impact of research data services are rare and we have the opportunity to test some experimental approaches. Already we have interesting data and I think this is going to be a very significant study. We are about half-way though – having started mid-July 2011 and will finish in January 2012.

Smart Research Framework (SRF) and Biomedical Research Infrastructure Software Service kit (BRISSkit). We are  junior partners in two of the four Research Data Management projects in the JISC University Modernisation Fund shared services programme. In both we are supporting their work on developing cost/benefit and return on investment  cases. Both are great projects so I would encourage you to take a look. They will complete in the first half of next year.

Research 360 – just starting up at the University of Bath and will run until March 2013. The Project addresses the long-tail of high quality small science characterised by applied research and faculty-industry partnerships. We will contribute to building on and applying the I2S2/KRDS Benefits Toolkit with a focus on faculty research data drivers for the Research Excellence Framework (REF).

DPC Technology Watch Series – work is also progressing  for the five titles in the new DPC Technology Watch Series. I’m really enjoying working  as series editor with William Kilbride at the DPC  and the authors and keeping up to date on cutting-edge developments. Look out for the first release in the New Year (or from December if you are a DPC member).


          
				
			

More UK Government Funding for e-research HPC and Data Archiving

The UK Government announced this week, that in addition to the ring-fenced science budget, ‘earmarked’ capital funding of £145m for High Performance Computing and e-infrastructure subject to approval of the full business case being developed by the Research Councils.
Universities and Science Minister David Willetts said:
“Significantly improving computing infrastructure is vital to driving growth and giving businesses the confidence to invest in the UK. It has the potential to significantly improve the design and manufacturing process, encouraging innovation across a whole range of sectors.
The investment will also be of enormous benefit to our world-class research base. It will enable universities to carry out highly sophisticated research and archive more data, keeping us at the very leading edge of science.”

Benefits from and Sustainability for Research Data Infrastructure

I’m pleased to announce the release today of the Report ‘Benefits from the Infrastructure Projects in the JISC Managing Research Data Programme‘ prepared by Charles Beagrie Ltd for the JISC.

JISC’s Managing Research Data programme has invested nearly £2M, in a strand of eight Research Data Management Infrastructure (RDMI) projects to provide the UK Higher Education sector with examples of good research data management.

The eight projects studied in the report have described a wide range of key benefits from investments in research data infrastructure including:

Ability to cite shared data (Admiral Project, University of Oxford):

Integrated thinking around research data management (IDMB Project, University of Southampton):

Enhanced data sharing and discovery (FISHnet Project, Freshwater Biological Association and King’s College London);

Research efficiency, rapid access to data (I2S2 Project, Universities of Bath/Cambridge/Southampton, Charles Beagrie and the Science and Technology Facilities Council);

Clear and accessible guidance (Incremental Project, Universities of Cambridge and Glasgow);

Improving data management plans, policies and institutional settings (MaDAM Project University of Manchester;

Cost Savings through Centralisation and Virtualisation (Sudamih Project, University of Oxford).

Our report provides an analysis and synthesis of all the benefits and metrics identified by the eight RDMI projects in their benefits case studies, the benefits and enhancements that accrued to existing tools and methodologies from them, and the emerging business cases (as of June 2011) for sustainability being built by the RDMI projects. A brief overview is available on the JISC webpage with the report itself.

KRDS Digital Preservation Benefits Analysis Toolkit and KRDS Updates now available

The KRDS-I2S2 Digital Preservation Benefits Analysis Project is pleased to announce the release of the KRDS Digital Preservation Benefits Analysis Toolkit. Development of the toolkit has been funded by JISC. The worksheets, guidance documentation and exemplar test cases can be downloaded from the project website.

The Toolkit consists of two tools: the KRDS Benefits Framework (Tool 1); and the Value-chain and Benefits Impact tool (Tool 2). Each tool consists of a detailed guide and worksheet(s). Both tools have drawn on partner case studies and previous work on benefits and impact for digital curation/preservation. This experience has provided a series of common examples of generic benefits that are employed in both tools for users to modify or add to as required.

The KRDS Benefits Framework (Tool 1) is the “entry-level” tool requiring Less experience and effort to implement and can be used as a stand-alone tool in many tasks. It can also be the starting point and provide input to the use of the Value-chain and Impact analysis.

The Value-chain and Benefits Impact analysis (Tool 2) is the more advanced tool in the Toolkit and requires more experience and effort to implement. It is likely to be most useful in a smaller sub-set of longer-term and intensive activities such as evaluation and strategic planning.

The combined Toolkit provides a very flexible set of tools, worksheets, and lists of examples of generic benefits and potential metrics. These are available for use in different combinations appropriate to needs and level of expertise.

Guides for the toolkit and each individual tool and case studies of completed examples of the worksheets provide documentation and support for your own implementation.

In addition we have updated the KRDS Factsheet (new version 2 July 2011) and the KRDS User Guide (new version 2 July 2011) on the KRDS web site. The benefits toolkit is also linked from there. For future reference please bookmark the KRDS web site as all the latest KRDS tools and materials and updates are/will be accessible from that access point.

Report and Presentations from the JISC Digital Curation/Preservation Benefits Tools Project Dissemination Workshop

There was a very successful end of project dissemination workshop and lively discussion last week on implementing the toolkit with funders and other attendees. A full report of the workshop and links to the presentations are provided below. The Benefits Analysis Toolkit will be released on 31 July from the project web site and the KRDS web site.

Tools Background

This is a six month project funded by JISC, testing developing and documenting a toolkit consisting of two evolving tools, the KRDS Benefits Framework and the Value Chain and Benefits Impact tool. The Benefits Framework is the entry level tool and Value Chain and Benefits Impact tool is more advanced with a narrower range of applicable activities. Any benefit from digital curation should fit within the Framework and can be reworded and adapted to fit with the local application. From the funders perspective the easily tailored benefits offer a consistent and powerful way of stimulating thinking. The toolkit’s official release date is July 31st.

Welcome and Project Background (Liz Lyon UKOLN) [Presentation]

The Toolset (Neil Beagrie, Charles Beagrie Ltd) [Presentation]

Case Studies

Dipak Kalra (Centre for Health Informatics and Multi-professional Education (CHIME) at UCL)

The toolkit was used in an MRC data support service investigation to understand how data sharing takes place. He presented results via a ‘virtual study’ that took all six studies into account to be more comprehensive. Generic benefits were taken from the tool and given a localised expression etc. He summarised that the tool should work for these kinds of studies though some parts are more applicable. Working through a toolkit could be of value for studies and particularly useful for putting forward a case for funding or prioritising resource utilisation within a study. Completing the spreadsheet and working out weightings might be nicely undertaken in a team workshop.

[Presentation]

Catherine Hardman (Archaeology Data Service)

In this case the toolkit was used from the point of view of a repository (more a macro level than micro), for looking in particular at issues of cost in the lifecycle. Archives often have to help justify costs/ effort associated with digital preservation even if they are well established. This can be used to address a range of audiences and with different levels of complexity- in individual projects or within project teams to boost cases for support. The value chain can help with identifying different values for different audiences. Quantification of impact can help in a number of ways: in research bid terms it helps justify resources; in archive preparation terms it helps with selection and retention decisions. The tool can be used as a light touch to help persuade stakeholders of benefits or for deeper insight into project planning decisions.

[Presentation]

Monica Duke (SageCite Project)

Here the tools were used to assess the benefits of data citation, an undertaking with a project perspective based on an organisation whose main business is science. It showed direct benefits as well as indirect ones such as better discovery of network models and better access. The Benefits Framework was easy to apply and helped to articulate benefits, although an intermediary may be required to facilitate the process.

[Presentation]

Matthew Woollard (UK Data Archive (UKDA) at University of Essex)

The tools were put into practice at the UK Data Archive and used to emphasise benefits to stakeholders. They helped to prioritise internal activities, justify costs to stakeholders and give an understanding of the service impact. They showed where value added is needed, where value is added, and who can benefit and when. The framework for activities seems to be where it will be of most use. It is important to note that generic benefits may have impacts to more than one stakeholder.

[Presentation]

Discussion

Q: What is the ongoing support for the tool?

  • It will be present on the project website with a persistent web archive copy. There is a commitment to make it continuously available and it may be updated in future in light of future projects and applications. There is extensive documentation and if the need were felt for more support there is the potential for consultancy and assistance from Charles Beagrie Ltd as required.

Q: Do you see it incorporated within an outline data management plan?

  • I can definitely see an advantage in the benefits framework. You can also use the value chain in a data heavy project, possibly when sitting down as a project group.

Q: Are we going to get too many statements on value, many of which are blatantly obvious rather than generically just true? If expressions are generic it would be better to cut them out.

  • The tool is for focussing the mind and the generic examples only a starting point for what should be customised specific statements of value. In terms of presentation in the user guide we present an alternative version of the completed Framework with more specific examples and level of detail for the benefits. The user should have the ability to select those points of greatest impact for specific stakeholders and develop them ie not presenting a generic benefit.
  • We are interested in ensuring researchers can do research and explaining value to the government and other stakeholders. If you’re promoting data sharing benefits then also promoting them to an internal audience is important and powerful for motivation.
  • Research Councils can be deluged with metrics- it is better to have a few, simple and powerfully chosen. Case studies are incredibly powerful though not sufficient on their own. You spend little time discussing them compared to the time taken to create them. A case study should actually illustrate a metric.
  • The Benefits Framework looks helpful in learning and preparation- in evaluation it should then be less necessary.
  • Part of what we are doing is upping the game with studies and funders. Funders will need to respond proactively. There is a need for a more forceful tool but it is premature to deliver it now, such as a planning tool allowing the user to take up to three actions and workup a two or three year action plan. This is a possible direction of travel in the future.

Q: Homogenisation? What does it mean for funders when there is a long checklist of benefits? If established where will it lead us? Funders will have to look closely and make sure they are used carefully to draw out where we want benefits to accrue. Who is this making life easier for?

  • Hopefully for researchers. They often have a box to fill in anyway but generally it is not well structured. The framework is fast to use, not a heavyweight commitment. We would hope it could make filling in of a benefits statement richer without much extra work as it provides a more advanced starting point for brainstorming the benefits.
  • This has to be looked at against administrative burden. If it enables the user to identify and realise benefits they otherwise would not have realised then it has an advantage. If it isn’t repaid by better realisation of benefits then it deserves to fail.
  • The process of using the tool can have valuable results in itself.
  • Many benefits feed into other benefits. The funder should ask for requirements, it’s not necessary to show everything. This is a platform that everyone can use in a way that benefits them.
  • There is value in prioritisation and communication. If we can work with researchers to highlight the key impacts of what they’re doing then the tool when simply done is of real value.

Q: We are talking about potential benefits- they haven’t actually been achieved. I can see the theoretical value but am worried these benefits could be three or four years ahead of what we can actually achieve.

  • The time element of the framework does bring that in. We’re trying to think not just of the long term but how to get there and any benefits along the way. The element is there but you must have some degree of caution with how you apply this tool in the same way as any other.

Q: You mentioned OCLC is a partner in the project. Are they involved because of their interest in cataloguing and metadata?

  • Our partner is the research division within OCLC which has a broad range of interests within digital library research. Brian Lavoie who has had an important input to this project and KRDS is a research scientist there. As an economist he has taken a close interest in the economics and benefits of digital preservation and this has been an important theme within OCLC Research – hence their interest and active participation in the project.

Q: I’m not sure who’s going to use the tool. What audience are you promoting it to? Will it mean a generic standard will be adopted?

  • Different sectors or disciplines are different. There aren’t homogenous states so there will always be bespoke relevant next steps in working from the tools.
  • Examples of all the common benefits listed are not seen in every project. There should always be a degree of selection so you wouldn’t end up with homogenous benefits in every case. Generic benefits should also be customised and expressed in ways specific to a particular project or service.
  • What you may have to do is demonstrate your benefits to a wider audience not just a primary beneficiary so sometimes the wider list of benefits is also helpful. It is good to put in front of workers to demonstrate why things are done in a certain way. The audience is not as wide as we would like it to be but the value to those who can use it is great.
  • The case studies presented are tailored towards particular projects/services but it can be adapted without too much additional tailoring. There may be elements which need to be tweaked.
  • I think there is value beyond the digital preservation community particularly for the Framework and maybe other versions could be needed tailored to those other audiences.
  • Arguments for further funding are always made on the science; informatics communities to some extent are disenfranchised. Anything we can do that supports honesty and helps to get discussion going within studies by linking benefits to science and data management must be good. If the Framework can accommodate different perspectives of benefits and allows them to be joined up in the story then we should try it out to more people.
  • There can be reciprocal benefits or benefits with clear knock on effects to each other. Actions may give benefits to the user, which give benefits to the creator.
  • Could there be eventual development of a web/matrix of benefits? Not one-to-one or even two-way but a network with flow going around it.

Q: Good ideas unless heavily marketed don’t take off. Even if there is a benefit to a tool it wouldn’t be given unless people know to use it. Are there steps funders would advise to encourage researchers and services pro-actively in seizing benefits and using the tools?

  • Will people be persuaded to use the tool to compete? You only compete if a competition is created.
  • Once certain good policies are floating around everyone uses them to tick the box whether they are applicable or not.

Q: Will presentations from the workshop be available later?

  • Yes we intend to make them available later and a short write-up of the day and key areas of discussion.

 

Update on the KRDS Digital Curation/Preservation Benefits Toolkit

I’m busy preparing with project colleagues for the dissemination workshop for the JISC  Digital Preservation Benefits Analysis Tools project in London on Tuesday.

The Toolkit is nearing its final version and we are adding case studies and worked examples ready for online release at the end of this month.

For those looking for an early taster (and not attending the workshop), here is a quick preview:

The Toolkit consists of two tools: the KRDS Benefits Framework (now in public version 3); and the Value-chain and Benefits Impact tool (now in public version 2). Each tool consists of a more detailed guide and worksheet(s). Both tools have drawn on partner case studies and previous work on benefits and impact for digital curation/preservation. This experience has provided a series of common examples of generic benefits that are employed in both tools for users to modify or add to as required.

It is designed for use by a wide audience including funders, researchers and project staff, and personnel in university central services, data archives and repositories.

I think the project has moved the usability of the KRDS Benefits Framework and the Value-chain and Impact tool on immensely from their early research project roots. Hopefully both existing and new users will find the new Toolkit a big improvement and valuable in their day to day work.

We will announce its release when finalised at the end of July via various email lists and this blog.

 

The Benefits of Research Data Management

Projects from the JISC Managing Research Data Programme were involved in a Parallel Session at the annual JISC Conference on Tuesday this week.

Entitled ‘The benefits of more effective research data management in UK Universities’, the session explained how projects have been developing ‘Benefits Case Studies’  with support from Charles Beagrie Ltd to provide evidence of the positive effects of improvements which they have engineered.  The case studies provide significant indications of improved research efficiency through more effective research data management.  The case studies will be synthesised in a report by Neil Beagrie due for release in May.

Presentations from the parallel session are available online at:
http://www.jisc.ac.uk/events/2011/03/jisc11/programme/1researchdata.aspx

They are best perused in the following order:

Simon Hodson, JISCMRD, Introduction
Neil Beagrie, Cost-Benefits and Business Cases Support Role
Manjula Patel and Neil Beagrie, I2S2 Project, UKOLN, University of Bath
June Finch, MaDAM Project, University of Manchester
Jonathan Tedds, HALOGEN Project, University of Leicester

A Researcher-Centric Version of the KRDS Activity Model: the I2S2 Project

The Keeping Research Data Safe (KRDS) project has produced a widely used KRDS Activity Model for costing digital preservation of research data. KRDS has developed from relatively small-scale incremental projects and we recognise that there were still significant areas for future work such as the recently published (Dec 2010) KRDS User Guide. The KRDS2 final report published earlier last year outlined a number of key recommendations for future development including:

  • “Examine further development of the pre-archive phase of the KRDS2 activity model and produce versions of the model from a researcher’s perspective.”

This suggested work has now been addressed by one of the outputs from the Infrastructure for Integration in Structural Sciences (I2S2) Project funded under the Research Data Management Infrastructure strand of the JISC’s Managing Research Data Programme.

I2S2 has been using KRDS as a basis for costing and benefits analysis. One of the outputs has been an “Idealised Scientific Research Data Lifecycle Model”, which seeks to extend and adapt from a “researcher perspective”, the Keeping Research Data Safe (KRDS) Activity Model, providing a model which reflects “research data management” or the digital preservation lifecycle in its broadest interpretation. It adapts KRDS from an archive-centric to a researcher-centric view by:

  • Defining and emphasising more of the activities in the research (KRDS “Pre-Archive” ) phase where research data is created;
  • Adding a “Publication” set of activities;
  • Concatenating the KRDS “Archive” phase activities in the centre of the model for simplification and presentational purposes;
  • Adding some specific local research administration activities;
  • In addition for the purposes of the project, it adds some selective detail of information flows and information objects between the activities.

This is the current version (Dec 2010) of the I2S2 Idealised Model.

Note this is an idealised model and several activities such as peer review or conduct experiment may have multiple instances or repetitions. “Documentation, Metadata, and Storage” may also  be undertaken as researcher activities independent of the archive in other instances and in the KRDS activity model. It also represents a project view as of December 2010 and may be subject to further changes.

PPT version of the I2S2 model incorporating relevant notes is available on the I2S2 project website.

The I2S2 project aims to understand and identify the requirements for a data-driven research infrastructure in the Structural Sciences.  The work is focused on the exemplar domain of Chemistry, but with a view towards inter-disciplinary application. Current work inter alia includes developing a set of tools and approaches to identify and provide indicators and metrics for the benefits arising from I2S2. This will extend work and the tools available for implementing the KRDS Benefits Taxonomy.

The partners in I2S2 are UKOLN (University of Bath), the Digital Curation Centre, University of Southampton, University of Cambridge, Science & Technology Facilities Council, and Charles Beagrie Ltd.

Digital Forensics and Cultural Heritage

Ever since a Digital Lives seminar at the British Library earlier this year previewed some of the work, I’ve been looking forward to the publication of this CLIR report on digital forensics and the cultural heritage.

The Council on Library and Information Resources (CLIR) has now published the report, Digital Forensics and Born-Digital Content in Cultural Heritage Collections, by Matthew G. Kirschenbaum, Richard Ovenden, and Gabriela Redwine.

Digital forensics was once specialised to fields of law enforcement, computer security, and national defence, but because most records today are born digital, libraries, archives, and other collecting institutions increasingly receive computer storage media-and sometimes entire computers-as part of their acquisition of “papers.” Staff at these institutions face challenges such as accessing and preserving legacy formats, recovering data, ensuring authenticity, and maintaining trust. The methods and tools that forensics experts have developed can be useful in meeting these challenges. For example, the same forensics software that indexes a criminal suspect’s hard drive allows the archivist to prepare a comprehensive manifest of the electronic files a donor has turned over for accession.

The report introduces the field of digital forensics in the cultural heritage sector and explores some points of convergence between the interests of those charged with collecting and maintaining born-digital cultural heritage materials and those charged with collecting and maintaining legal evidence.

Digital Forensics and Born-Digital Content in Cultural Heritage Collections is now available electronically at http://www.clir.org/pubs/abstract/pub149abst.html. Print copies will be available in January for ordering through CLIR’s Web site, for $25 per copy plus shipping and handling.

I’ve downloaded the electronic edition but have yet to read it (that’s part of  my Xmas reading sorted) but if the seminar is anything to go by it will be a great contribution to the emerging field on personal digital collections and the curation of digital heritage.

« Prev - Next »