Digital Curation

NY Times article: Digital Archivists in Demand

Readers of the blog may be interested in the article Digital Archivists in Demand which appeared in the Fresh Starts column of business section of the New York Times on Saturday in both print and online editions. This is a monthly column covering emerging jobs and job trends.

The piece focusses on careers for digital asset managers, digital archivists and digital preservation officers and how demand for them is expanding. It features Jacob Nadal, the preservation officer at the University of California, Los Angeles and Victoria McCargar, a preservation consultant in Los Angeles and a lecturer at U.C.L.A. and San José State University.

Vicky McCargar estimates that 20,000 people work in the field today — plus others in related areas — and she expects that to triple over the next decade, assuming that economic conditions stabilise before long.

US rates of pay for Digital Archivists are also cited in the article. Digital asset managers at public facilities would do well to make $70,000 a year. Salaries for their corporate counterparts are generally higher. Consultants who can make recommendations on systems can make $150 an hour.Those who manage them in the commercial sector once they’re up and running make from the $70,000’s up to $100,000 a year.

Despite the higher pay in the corporate world Jacob Nadal outlines the case for working in the public sector: “Public-sector institutions just strike me as far, far cooler. They have better collections, obviously, and they are innovative, connected and challenging in ways that seem more substantial to me.”

It is good to see that mainstream newspapers are beginning to see digital archiving as an emerging career path. I have given short seminars on digital preservation and curation to students on the Information Studies courses at UCL over the last couple of years. I always emphasis to them that not only is it intellectually challenging field but a very good career option for those with a traditional archive or library training and an interest in electronic information.

Stewardship of Research Data in Canada: A Gap Analysis

I have previously blogged (see Research Data Canada) on work by The Canadian Research Data Strategy Working Group.

Its report “Stewardship of Research Data in Canada: A Gap Analysis” is now available. Using the data lifecycle as a framework, the report examines Canada’s current state versus an ‘ideal state’ based on existing international best practices across 10 indicators. The indicators include: policies, funding, roles and responsibilities, standards, data repositories, skills and training, accessibility, and preservation.

The analysis reveals significant barriers to the access and preservation of research data ’” barriers that could have a serious impact on the future of Canadian research and innovation if not addressed. For example, large amounts of data are being lost because of the woefully inadequate number of trusted data repositories in Canada.

The report summarises gaps for Canadian research data across the data lifecycle as follows:

Data Production

  • Priority is on immediate use, rather than potential for long-term exploitation.
  • Limited funding mechanisms to prepare data appropriately for later use.
  • Few research institutions require data management plans.
  • No national organization that can advise and assist with application of data standards.

Data Dissemination

  • Lack of policies governing the standards applied to ensure data dissemination.
  • Researchers unwilling to share data, because of lack of time and expertise required.
  • Some policies require certain types of data be destroyed after a research project is over.

Long-term Management of Data

  • Lack of coverage and capacity of data repositories.
  • Preservation activities in repositories are not comprehensive.
  • Limited funding for data repositories in Canada.
  • Few incentives for researchers to deposit data into archives.

Discovery and Repurposing

  • Most data rests on the hard drives of researchers and is inaccessible by others.
  • Per per view and licensed access mechanisms are common where data are available.
  • Many researchers are reluctant to enable access to their data because they feel it is their intellectual property.

The gap analysis will be extremely familar to many – reflecting difficulties recognised and responded to in many different countries such as the USA (Datanets), Australia (ANDS), and the UK (UKRDS feasibility study). It is pleasing to see the report cite the UK and USA as two countries that are seen internationally to be leading responses to these challenges.

It is reported that in the last several months, the Canadian Research Data Strategy Working Group has also made progress on a number of other fronts. Three Task Groups have been established to support efforts in addressing the gaps identified in the analysis. The Task Groups are:

1. Policies, funding and research;

2. Infrastructure and services; and

3. Capacity (skills, training, and reward systems). The Capacity Task Group is currently developing a workshop on data management for researchers, which it hopes to begin offering in 2009.

The next steps for the Working Group are to develop an action plan and an engagement strategy to involve senior leaders from the various institutions represented on the Working Group.

Google pulls its research datasets service

Early in 2008 there was a lot of excitement around the announcement that Google was about to launch a free service for hosting research datasets as noted in our blog posting Google to host research datasets twelve months ago.

Less widely reported so far – and I had missed it until I saw it in the Open Access News – was the report by Wired that Google has withdrawn the proposed service first known as Palimpsest (and later re-named Google Research Datasets).

Unfortunately the proposed service seems to have fallen prey to the credit crunch. The issue of sustainable funding for long-term services for datasets and the challenges of doing this in the current commercial environment are thrown into stark relief. For further information and comment see the Wired blog Google shutters its Science Data Service.

Return from DCC – thoughts on ethics

I came back from another DCC international conference in Edinburgh (1-3 December) and almost immediately succumbed with flu – so this is a late post. Fortunately others including Kevin Ashley in the ulcc da blog and Chris Rusbridge in the digital curation blog have given quite detailed reports of many of the excellent sessions and presentations.

I just wanted to pick up on one aspect which struck me from the keynote Genomic Medicine in the Digital Age by Prof David Porteous and which has also been picked up and commented on by Mags McGeever’s post Healthy Consent on the DCC Blawg, namely ethical consent and research data.

Prof Porteous’ talk focussed on his work in Generation Scotland , and ethical issues around the process of “open consent” (an interesting long-term variant of informed consent) formed part of this. A particular bone of contention was the stance taken by the chairman of the National Information Governance Board for Health and Social Care on research data -see the Guardian report of his views.

Prof. Porteous is the most recent speaker voicing a concern which I’ve heard expressed now by many different researchers – someone really should arrange that offer to the chairman of a “cup of tea and a wee chat” to put across the long-term damage to health research which is the reverse side of this argument.

The Economics of Digital Preservation: Blue Ribbon Task Force Interim report

I was pleased to see that the International Blue Ribbon Task Force has issued its Interim Report on the economic issues for digital preservation brought on by the data deluge in the Information Age and the use that the interim report makes of the research undertaken by the LIFE and Keeping Research Data Safe studies.

The following press release appears on the UC San Diego website:

A blue ribbon task force, commissioned late last year to identify sustainable economic models to provide access to the ever-growing amount of digital information in the public interest, has issued its interim report. The report calls the current situation urgent, and details systemic pitfalls in developing economic models for sustainable access to digital data.

There is no time to waste, according to the new report from the Blue Ribbon Task Force on Sustainable Digital Preservation and Access, launched by the National Science Foundation and the Andrew W. Mellon Foundation in partnership with the Library of Congress, the Joint Information Systems Committee of the United Kingdom, the Council on Library and Information Resources, and the National Archives and Records Administration.

A recent study by the International Data Corporation (IDC) said that in 2007, the amount of digital data began to exceed the amount of storage to retain it, and will continue to grow faster than storage capacity from here on. The IDC study predicts that by 2011, our ‘digital universe’: consisting of digitally-based text, video, images, music, etc.: will be 10 times the size it was in 2006.

Although not all of this data should be preserved, digital data within the public interest: digital official and historical documents, research data sets, YouTube videos of presidential addresses, etc.: must be retained to maintain an accurate and complete ‘digital record’ of our society. Such digital information is now part of what is known as cyberinfrastructure, an organized aggregate of computers, networks, data, storage, software systems, and the experts who run them that is vital to our life and work in the Information Age.

‘NSF and other organizations, both national and international, are funding research programs to address these technical and cyberinfrastructure issues,’ said Lucy Nowell, Program Director for the Office of Cyberinfrastructure at the National Science Foundation. ‘This is the only group I know of that is chartered to help us understand the economic issues surrounding sustainable repositories and identify candidate solutions.’

While storage and technological issues have been at the forefront of the discussion on digital information, relatively little focus has been on the economic aspect of preserving vast amounts of digital data fundamental to the modern world.

‘The long-term accessibility and use of valuable digital materials requires digital preservation activities that are economically sustainable: in other words, provisioned with sufficient funding and other resources on an ongoing basis to achieve their long-term goals,’ said Brian Lavoie, a co-chair of the task force and a research scientist with OCLC, an international library service and research organization headquartered in Dublin, Ohio. ‘Economically sustainable digital preservation is a necessary condition for securing the long-term future of our scholarly and cultural record.’

‘Access to data tomorrow requires decisions concerning preservation today,’ said Fran Berman, director of the San Diego Supercomputer Center at the University of California San Diego, and also a co-chair on the task force. ‘The Blue Ribbon Task Forces interim report represents a year of testimony and investigation into the economic models supporting current practice in digital preservation and access across sectors.’

The interim report traces the contours of economically sustainable digital preservation, and identifies and explains the necessary conditions for achieving economic sustainability. The report also synthesizes current thinking on this topic, including testimony from 16 leading experts in digital preservation representing a variety of domains. In reviewing this synthesis, the task force identified a series of systemic challenges that create barriers to long-term, economically viable solutions. Some of these challenges include:

  • Inadequacy of funding models to address long-term access and preservation needs. Funding models for efforts that incorporate digital access and preservation are often not persistent: they may be ‘one time’ efforts subsequently abandoned as more critical short-term priorities emerge.
  • Confusion and/or lack of alignment between stakeholders, roles, and responsibilities with respect to digital access and preservation. Often, those who create and use digital information are not responsible for serving as stewards to support preservation and access. Consequently, the costs may not be shared, which can lead to inadequate economic models for sustainability.
  • Inadequate institutional, enterprise, and/or community incentives to support the collaboration needed to reinforce sustainable economic models. Digital preservation and access require long-range planning and support, as well as agreement on formats, standards and use models, and hardware/software compatibility.
  • Complacency that current practices are ‘good enough.’ The urgency of developing sustainable economic models for digital information is not uniformly appreciated. There is general agreement that leadership and competitiveness, if not institutional survival, in the Information Age depends on the persistent availability of digital information, making preservation of that information an urgent priority.
  • Fear that digital access and preservation is too big to take on. There is general agreement that in its entirety, digital preservation is a big problem, incorporating technical, economic, regulatory, policy, social, and other aspects. But it is not insurmountable. Digital access and preservation may be as manageable as including a ‘data bill’ as an explicit and fixed part of an institutions business model. Successes depend on making sustainable digital access and preservation a persistent ‘line item’ on the part of stakeholders.

Continuing its work for a second and final year, the Blue Ribbon Task Force on Sustainable Digital Preservation and Access will issue its final report in late 2009 proposing practical recommendations for sustainable economic models to support access and preservation for digital data in the public interest.

To view the complete BRTF-SDPA Interim Report, click here.

For a complete list of BRTF-SDPA members, click here.

UK Research Data Service Study – International Conference

I am pleased to forward the announcement that the final report for the UK Research Data Service (UKRDS) Feasibility Study Project has been submitted and an International Conference on the UKRDS Feasibility Study will be held at The Royal Society, London on Thursday, 26 February 2009.

Booking for this international conference of senior policymakers, funders, scientists, IT managers, librarians and data service providers has now opened: Attendance at the conference is free. Places are limited, so early booking is advised.

The UKRDS feasibility study was commissioned to explore a range of models for the provision of a national infrastructure for digital research data management. It has brought together key UK stakeholders, including the Research Councils, JISC, HEFCE, British Library, Research Information Network, Wellcome Trust, researchers, and university IT and library managers, and it builds on the work of the UK’s Office of Science and Innovation e-infrastructure group. It also takes into account international developments in this area.

The UKRDS final report is due to be released soon and makes important recommendations for investment in this key part of the UK national e-infrastructure.

The study has been funded by HEFCE as part of its Shared Services programme, with additional support from JISC, Research Libraries UK (RLUK) and the Russell Group IT Directors (RUGIT). It has been led by the London School of Economics, with Serco Consulting as lead consultants supported by Charles Beagrie Limited and Grant Thornton as sub-contractors.

and the Top Five are…

I find it difficult to gauge the impact of different JISC studies other than ancedotally and as an author of JISC-funded reports I often wonder what the take-up has been, so I was intrigued to see a brief new section in the latest Autumn 2008 issue 23 of JISC Inform devoted to the Top five publications…

I understand from colleagues this represents a snapshot of the top five monthly downloads when Inform went to print (i.e. October 2008). Downloads probably peak during the first few months of publication so I have added month of publication as an additional factor/caveat in to the rankings which were as follows:

Top five publications..

  1. What is Web 2.0? TechWatch report (March 2008)
  2. Great expectations of ICT: JISC briefing paper (June 2008)
  3. Keeping research data safe: Charles Beagrie report (May 2008)
  4. Shibboleth – connecting people and resources: JISC briefing paper (March 2006)
  5. Information behaviour of the researcher of the future (‘Google Generation report’): CIBER report (January 2008).

JISC is quite a large specialist publisher: there have been 28 JISC Reports and 24 JISC Briefing Papers published in 2008 alone so far, so there is stiff competition to get into the listings and I was chuffed to see Keeping Research Data Safe at No. 3.

It was even nicer to hear that the listings had a new Number 1 in November: the Digital Preservation Policies Study (October 2008) was the runaway no. 1 with over 2,500 downloads.

Christmas must have come early this year 🙂

Keeping the Records of Science Accessible: Can We Afford It?

The Alliance for Permanent Access has just completed its annual conference (Budapest, 4 November) : this year the theme was the economics of archiving scientific data.

The Alliance’s international membership includes strategic partners from the research community, libraries, publishers, and digital preservation organisations. Participants called upon the Alliance to act as an umbrella organisation to secure sustainable funding for permanent access in Europe.

A comprehensive conference report (complete with photographs conveying the atmosphere!), together with the powerpoint presentations, abstracts and authors biographies is now available online.

German Science Priority Initiative – Digital Information and e-infrastructure

I have been tracking national research initiatives in Australia, Canada, UK and USA in various blogs over previous months. Another potentially very important national initiative can now be added to the list from Germany.

An alliance of scientific organisations in Germany which includes all the majors players such as Deutsche Forschungsgemeinshaft (DFG, the German Research Foundation), Fraunhofer Society, Helmholtz Association of German Research Centres, and the Max Planck Society, have signed a joint national e-infrastructure policy initiative with six priority areas focusing on:

  • National licencing of e-journals;
  • Open Access;
  • National hosting strategy for preservation of e-journals;
  • Preservation and re-use of primary research data;
  • Virtual research environments; and
  • Legal frameworks (focusing on copyright law and equalising VAT treatment on print and electronic publications).

The Alliance agreed to coordinate the activities of the individual partner organisations and to expand on the ideal of the innovative information environment by means of a Joint Priority Initiative from 2008 to 2012 with the following goals:

  • to guarantee the broadest possible access to digital publications, digital data and other source materials;
  • to utilise digital media to create the ideal conditions for the distribution and reception of publications related to German research;
  • to ensure the long-term availability of the digital media and contents that have been acquired from round the world and their integration in the digital research environment;
  • to support collaborative research by means of innovative information technologies.

Further information on the initiative is now available to download as a PDF in English or you can brush up your language skills (as I did or at least tried to) and read it in the original German 🙂

Interim Report – UK Research Data Feasibility Study

I have previously blogged on UKRDS, the major consultancy work the company has been undertaking with ther lead partner SERCO Consulting over the last six months on a UK Research Data Service feasibility study for the Higher Education Funding Council for England (HEFCE).

The interim report of the study has just been released. The report analyses the current situation in the UK with a detailed review of relevant literature and funders policies, and data drawn from four major case study universities (Bristol, Leeds, Leicester, and Oxford). It describes the emerging trends of local data repositories and national facilities in the UK and also looks internationally at Australia, the US and the EU. Finally it presents possible ways forward for UKRDS. Preliminary findings from a UKRDS survey of over 700 UK researchers are presented in an Appendix. The study has now moved into its second phase building on the interim report and developing the business case.

Luis Martinez-Uribe, Digital Repositories Research Co-ordinator at Oxford University has written on the interim report in his blog “I highly recommend everyone with an interest in research data management to have a look at this report as not only it captures the current state of affairs in the UK and elsewhere but also offers possible ways forward.”

« Prev - Next »