October 2008

Digital Universe: forecasting worldwide information growth to 2011

I was at the Oxford Institutional and National Services for Research Data Management Workshop yesterday.

One of the many facts and updates that caught my eye was a reference in Natasha Balac’s presentation to the IDC’s Expanding Digital Universe forecasts of worldwide information growth (which seem to have been inspired by earlier work on How Much Information done at Berkeley in 2000 and 2003).

A bit of further research today located the first forecast to 2010 (published in 2007) and the latest updated forecast taking this to 2011 (published this year).

There is a lot for anyone interested in information management in these forecasts: I was particularly taken by the fact that 2007 was the “crossover year” when for the first time information generated worldwide exceeded available data storage (think of all those digital camera or mobile images or experimental research and observational data). From now on it is projected we will never have enough storage space for the digital information we produce. Of course no-one should want to keep everything but from now on selection of what we keep digitally will be a necessity.

There is also some thoughtful information on your “digital shadow” – the information on you generated everyday by Web and financial transaction, CCTV, etc as well as on personal information management (stuff generated directly by individuals).

It is best to read both forecasts in sequence as the first report is the most extensive: there is even discussion of digital preservation in the first forecast to 2010. Enjoy.

ALPSP Survey – long-term preservation strategies for e-journals

I have been reading through the report of a recent (July 2008) survey investigating preservation strategies amongst members of the Association of Learned and Professional Society Publishers (ALPSP). It makes interesting reading and overall is a very worthwhile report. The report is available as a free pdf download from the ALPSP website.

The responses came from 68 publishers out of a total ALPSP membership of 240 (just over 23%) so results need to be treated with some caution and the respondents may be less representative than a true sample.

Key Findings were:

1. The majority of ALPSP publishers who responded to the survey believe long-term preservation to be a critical issue: 91% either agreed or strongly agreed with the statement “Long-term preservation is an issue which urgently needs to be addressed within the industry.” 9% were neutral; no-one disagreed.

2. ALPSP publishers are strongly motivated to engage with preservation because of its critical importance to their customers, with over 90% of respondents citing this as a major motivating factor: a heartening response for those in the library community.

3. Although 68% of publishers reported understanding of preservation issues within their organisation to be either ‘good’ or ‘reasonable’, the survey also revealed a wide range of concerns suggesting an overall lack of confidence, at least for the present. The survey revealed a strong desire amongst almost all publishers for the development of ‘best practice’ and industry standards.

4. There is some confusion surrounding the nature and extent of publisher participation in long-term preservation schemes, with high numbers of respondents declaring their organisation to be participating in one or more initiatives and yet the schemes themselves reporting substantially lower numbers presently taking part.

5. Publisher views on who should take responsibility for long-term preservation also reveal some interesting contradictions: despite presently supporting a range of preservation schemes, a significant majority of publishers indicated they would in fact prefer other groups and institutions to take this responsibility on. National libraries in particular were a popular choice.

6. Finally, the survey revealed most publishers are clear about the distinction between ensuring long-term access and ensuring long-term preservation, with the majority believing they have clear responsibility for long-term access. A worryingly high number however admit to either not trusting their present strategy or not currently having any strategy to deliver here.

Issus which particularly struck me were:

Key finding 4 – the high number of publishers (77%) who thought they were participating in one or more preservation schemes but in fact were/had been involved in time-limited trials which had lapsed, etc. The reality check showed the need to clarify which schemes publishers are truly and fully supporting.

Key findings 5 and 6 – there is still lack of clarity and understanding of digital preservation in terms of continuing/perpetual access (archiving guarantees and ongoing access rights of subscribers to paid content) and legal deposit (public good archiving for the long-term with limited access rights for non-subscribers). The issues can overlap in some services being offered by national libraries and both are “digital preservation” but the different user groups and rights mean it is helpful for them to be distinguished.

Perhaps a final key finding that could be added is that there is a significant and urgent opportunity to work with publishers on developing digital preservation strategies and practice. Whilst a majority of ALPSP publishers in the survey feel they have a responsibility for long-term (continuing/perpetual) access a substantial number do not have strategies in place to support this. The report suggests a strong need for an industry-wide working group, perhaps modelled on project COUNTER or project TRANSFER, through which publishers, librarians, preservation organisations and intermediaries, can map out the road ahead for digital preservation. The urgency is underlined by the fact that 75% of the respondents concurred with the survey statement question that “It is inevitable that, at some point in the future, access to some scholarly e-journals will be permanently lost due to a lack of preservatrion strategy”.

Web Archiving: POWR to Digital Continuity

The iPRES 2008 conference finished on Tuesday 30th September and has been fairly comprehensively reported by Chris Rusbridge in the digital curation blog.

I would like to pull out a couple of papers which were particular highlights for me in the programme: namely two Web-archiving projects, the JISC Funded POWR project and The National Archive (TNA)’ s Web Continuity Project.

POWR was the subject of an excellent presentation and paper by Brian Kelly which you can access on the UKOLN website. It gave examples of making the case for Web archiving within universities and its findings fitted extremely well with our own iPRES 2008 presentation of the Charles Beagrie work for JISC on Digital Preservation Policies and their implementation. Partners in the POWR project are UKOLN and ULCC and you can find further information and a draft Handbook on the POWR Project Blog.

The Web Continuity Project was another impressive iPRES paper from TNA (it could well be a hot betting favourite to complete a future TNA hat-trick of Digital Preservation Awards). The presentation pointed to the problems being encountered in Hansard (the official record of debates in the UK Parliament). Action had been requested by Jack Straw leader of the House of Commons when it was discovered that 60% of links in Hansard to UK Government electronic documents or webpages on websites for the period 1997 to 2006 were broken. The Web Continuity Project is an imaginative response from TNA using re-direction of failed searches to relevant pages in the TNA’s Web Archive of Government websites – the service developed by the project will go live in November.

Guardian Newspaper Editorial: In Praise of…Preserving Digital Memories

The Guardian Newspaper on Tuesday this week (30th September) devoted an editorial to digital preservation and the iPRES 2008 conference in London. The editorial “In praise of…preserving digital memories” is also available on the online version of the paper.

A short extract as follows:

“It ought to be reassuring that while governments are living a day-to-day existence trying to prevent a global financial implosion, some people are thinking centuries ahead. The British Library is hosting a conference of more than 250 experts from 33 countries to work out ways of preserving for future generations the huge amounts of data we store online. Since practically everything we write or watch these days is in digital form – from newspapers or state documents, to the minutes of the banking crisis or the latest edition of Grand Theft Auto – this is a task of mind-boggling proportions…. This won’t cure the banking crisis, but it will enable people in future centuries to understand it better. If all goes well, we will have the capacity to preserve as many of our memories, personal and national, as we want. Only time will tell whether these will last as long as the Rosetta Stone.”

With thanks to Najla Rettberg for drawing the editorial to my attention.