25 October 2013

Data citation at eResearch Australasia 2013

Data citation was a hot topic at this year's eResearch Australasia conference in Brisbane. As more institutions now have services for storing and sharing data in place, attention is now turning to encouraging re-use of data and, in particular, ensuring that researchers receive credit through scholarly attribution practices for their work producing data assets.

Griffith's Data Citation Project had a poster accepted for the conference, entitled Infrastructure, impact and outreach: Griffith University's Approach to Data Citation.

We also contributed to a half-day workshop facilitated by ANDS and including speakers from the Terrestrial Ecosystems Network (TERN), the Australian Antarctic Data Centre, and CSIRO. Our section of the workshop was entitled Encouraging use of data citations: Experiences building a culture of data citation at Griffith University. We focused on enhancements to discovery interfaces and smart use of metadata, as well as discussing some challenges in fostering a culture of data citation at an institutional level when there are many factors outside of any organisation's control (e.g. journal policies, research quality and rewards systems, style guides and reference management system functionality).

In addition to the workshop, Steve McEachern (Deputy Director of the Australian Data Archive) presented on Data Citation and Sharing in Australian Social Science – How, When and Why? ADA is conducting a survey of data citation attitudes and behaviours in the Australian social science community, results of which should be released in 2014.

12 June 2013

Data citation project wrap-up

Our data citation project officially finished at the end of May. To celebrate and to meet our goal of sharing our experiences as widely as possible, the project team presented an hour-long webinar on 4 June. More than 30 of our colleagues from around Australia attended, and we were pleased to receive some positive feedback.

An edited version of the webinar is now available on the Australian National Data Service's YouTube channel. If you didn't get the chance to attend in person, we'd love to hear your feedback and any questions about our work.

While our project found the currently available tools and methods for data metrics to be immature, Griffith now has some of the necessary building blocks in place to take advantage of new developments as they evolve. These building blocks include both prototype technical infrastructure (e.g. scripts for minting DOIs), a draft policy framework for managing DOIs, and some suggestions for citation-related enhancements to our content repositories and discovery services.

We used this project to explore formal and less formal ways of measuring impact. Part of the project involved evaluating the new Thomson Reuters Data Citation Index (DCI). Some of our subject librarians assisted with a trial of this new product during April 2013. To summarise the findings:
  • The Data Citation Index is a good fit for the suite of Web of Knowledge products that Thomson Reuters offers. This is a positive start, in that data can be seen as just another product in the citation databases.
  • However, the DCI is still an immature product. Issues identified include the quality of the data (which is dependent on journal policies and discipline conventions that are not yet well evolved to meet these needs), the limited coverage of disciplines, and the small number of Australian repositories that have been harvested. 
  • For Griffith, currently the cost of the DCI outweighs the benefits, but we should re-evaluate this regularly as costs change (e.g. if a national site licence were to be negotiated) and as the content improves and expands. 
Evaluation of altmetrics tools like ImpactStory was also in scope. Again, we found that the available tools and methods are still very much prototypes, and as our DOIs for data are not yet being routinely cited they are difficult to track! Nevertheless, this is a promising area for future efforts.

Our project aimed to raise awareness of data citation and impact with data collection owners. This was an important part of the project but also the most challenging. When communicating with researchers, we've identified that our credibility will be improved if we are:
  • aware of disciplinary differences in citation practices
  • honest about the still small and partial evidence base for the citation advantages associated with open data, and 
  • realistic about the lack of rewards for researchers for sharing data and having it cited by others. 

Promoting a culture of data citation will be a long-term ongoing process. Here at Griffith, we are now at the point where we have infrastructure in place for data to be deposited and for DOIs to be minted, and procedures that ensure these processes are understood. Griffith’s new best practice guidelines (to be released soon) will incorporate data citation as part of a holistic view of data management, and over time we would hope that information resources and training courses will reflect data citation practices better than they do now.

There are still many external drivers that are just as important, if not more important, than these institutional efforts. These factors include:
  • the scope of current qualitative and quantitative assessments of research quality
  • the policies and guidelines of scholarly journal publishers, and
  • the poor support for data citation in commonly used style manuals and bibliographic management software.

In light of these external factors, one of our final lessons learned was about being realistic about what can be achieved by a single institution. If we want to realise the benefits of data citation fully, collective action will be needed on many fronts.

We'd like to thank ANDS for funding some of the work described here, and for providing information and opportunities for discussion with other institutions embarking on similar projects. While the Data Citation Infrastructure Establishment Program has now formally ended at Griffith, our work in the areas of infrastructure, impact and outreach will continue. We look forward to contributing more as the broader ANDS partner community works together to develop data citation solutions and services.

21 March 2013

Landing pages, or citation springboards?

Have you noticed how we talk about a DOI going to a 'landing page'? It's a strange way to describe a page that should not be an endpoint but rather a springboard - somewhere people may land briefly but that is really designed to re-launch scholarly content into the workflows and tools that are in the scholar's space, rather than ours.

We are currently investigating a number of things that we hope will improve the discoverability, citability and re-use of our scholarly content. This includes data collections, the focus of this current project, but could be applied equally to all sorts of other outputs including journal articles, theses and technical reports.

Embedded metadata

We will embed some or all of the following types of metadata into the landing pages for each of our scholarly objects, including data collections.

If I understand it correctly, these types of embedded metadata will help search engines crawl and index our scholarly content more effectively.

It will also make it easier for a user of our services to import a citation for one of our scholarly outputs to a web-based reference manager like Zotero or Mendeley.

Downloadable metadata

Importing citation to offline reference managers

Importing citations from subscription databases into Endnote or other similar products is a common work practice for many researchers, and something that many students get trained to do from their undergraduate days. It is not something that you see so commonly offered by institutional repositories, but is definitely something worth thinking about.

Packaging metadata with downloaded objects

Just as software downloads often come with a readme.txt file, some repositories now package metadata with the object at the time of download. The Merritt repository run by the University of California Curation Center (UC3) takes this approach.

Good old cut-and-paste

Finally, let's not forget that having a prominent well-formed citation statement on your landing page lets users simply cut-and-paste into a document, a note-taking tool, or a virtual stickynote.

We're at an early stage of investigating these strategies; the approach we will take will depend not just on technical feasibility and user demands, but also on resourcing and policy decisions. Hopefully we can report back on some real life experience with at least some of these strategies soon. While they seem to be small things individually, we think that collectively they could go a long way towards turning our landing pages into springboards!

14 March 2013

Communicating with data depositors about DOIs and citation

We've recently been looking at the way that Dryad promotes data citation through their notifications to researchers following a deposit.

As part of their community outreach, Dryad have kindly provided the text of the notifications in their submission workflow (see this presentation, for example), which would otherwise have been invisible except to depositors. It has been great to see a working example of this kind of communication and to think how we might implement something similar here at Griffith, in the first instance as a manual process (by direct email to the researcher) and in future as an automatic part of the self-deposit process.

Our current draft text is as follows:

Thank you for your recent submission to the Griffith University Data Repository titled: 
Digital Object Identifier (DOI) 
Your data has received a unique identifier called a DOI. Including the DOI in published articles will make readers aware that the data files are available, and enable their access and citation.  
The DOI can be presented as follows: 
Data deposited in the Griffith University Data Repository: [insert the DOI here] 
Many journals specify a particular location for links to data in repositories, or have a section on data accessibility. You can also provide your data DOI in the text, just before the References. 
You can also use the DOI when showcasing your data through channels other than formal publications, such as press releases, social media updates, and as part of your CV.  
Your data will be presented to users of the repository with the following citation statement:  
[insert the citation statement here as it will appear on the display page] 
The repository also enables users to download this citation into common bibliographic management tools like Endnote, Zotero and Mendeley.  
Please let us know if you have any questions or concerns.  
The Griffith University Data Repository team

We have very closely followed the Dryad text for the first part of our draft email, but have chosen to supplement this in a few ways:

  1. We also suggest they use the DOI for less formal communications, including social media, because we are as interested in altmetrics for data as we are in data citations. 
  2. We suggest putting the DOI in documents like CVs. Conversations with our subject librarians have indicated that many researchers do not include DOIs (or even URLs in many cases) in their own publication lists. While the inclusion of DOIs on internal paperwork may not appear very significant, with more and more CV-type information now being posted online by researchers in LinkedIn, Academic.edu.au and other social services, over time this could make a difference (both in terms of metrics, but also hopefully in terms of cultural change).
  3. We may include a cut-and-paste-able version of the citation in the email, which will also appear on the landing page for the data collection. 
  4. We'll indicate the availability of the citation for download into some common bibliographic tools (more on these in a future blog post).

Several days ago we minted 14 DOIs for new collections that have arisen from a collaborative project to gather the outputs of the Urban Water Security Research Alliance. When these collections go live, we'll be sending out an email along the lines of the one above.

It will be interesting to see what kind of response we get to the email, and to track the use of these DOIs in future through formal indexes and altmetrics tools.

4 January 2013

Summary of data citation work Oct-Dec 2012

As 2013 kicks off, it seems a good time to reflect on the progress to date of our data citation project. 

Following a well-attended data citation session in November, four of our subject librarians (about a third of the staff from this team) volunteered to get involved in the project. I met with two of these librarians before Christmas to talk about citation patterns and impact motivations in their disciplines (creative arts and humanities, and health). These meetings were very informative, and have led to some current work investigating the visibility of data citation within the common style guides used at Griffith (APA, MLA and Vancouver, for starters). 

We are also looking at our current information guides and training programs on referencing and bibliographic management tools, with a view to seeing where data citation might be able to fit in.  As part of this investigation we came across the Data Citation LibGuide at Purdue University, which is a nice model for how to provide an introduction to data citation in a format that would be familiar to many university staff and students. Michael Witt from Purdue has kindly given us permission to repurpose this content, so a data citation LibGuide may well be something we'll produce before the end of the project. 

With regards to the bibliometrics and altmetrics components of the project, our vendor relations team in the library helped us to establish the annual cost for an institutional licence for the Data Citation Index and to organise a short trial of this product which we'll undertake in March or April. We've given ANDS some feedback about the requirement for a national approach to licensing the DCI, possible as an extension to the existing Universities Australia consortial licence for Web of KnowledgeOur first step with our altmetrics activity has been to investigate the Open Researcher and Contributor ID (ORCID) self-registration process which should facilitate our use of ImpactStory when the time comes.

Part of our work has been our own professional development - getting ourselves up to speed with what's happening in this fast-moving space. We've attended the data citation related presentations at the eResearch Conference (which I've blogged about here), as well as a Data Citation Index webinar run by Thompson Reuters. A professional development highlight was the ANDS data citation webinar series, which featured international speakers from Dryad, the UK Data Archive and ImpactStory. If you missed these talks, you can catch up with them via the ANDS YouTube channel. We've been very grateful for the chance to find out what others are doing through these events, and hope to return the favour by sharing the Griffith experience through an ANDS webinar later in the year. 

We still have plenty that we are trying to achieve in the next few months. Key infrastructure activities include: ensuring that DOI minting and auto-generated citation displays are  included in roadmaps for upcoming repositories projects; sharing a PHP script developed locally for minting DOIs; and evaluating other tools for DOI minting and maintenance that have been produced elsewhere. There is a great deal of thinking yet to be done on methodologies for reporting from the Data Citation Index, ImpactStory and any other tools we decide to investigate, and we'd be happy to hear from other organisations about methods that they have developed.