Dataset Level Metrics

From CASRAI

This Interest Group, initiated and led by Rebecca Lawrence, F1000Research and Sunje Dallmeier-Tiessen, CERN is charged with convening discussion among related initiatives and interested stakeholders (information providers, managers, users; i.e. from policy makers to data(base) providers to publishers) with a focus on the “dataset level metrics” challenge. The interest group is also charged with initiating projects to deliver specific outputs to support database- and vendor-neutral interoperability of information about research data between repositories, publishers, academic administrators, funding agencies and researchers. Projects launched by this interest group may be limited to delivering common agreements on terminology and vocabularies or they may also deliver selected proof-of-concepts, with volunteers (platforms, publishers, funders and institutes) offering their facilities to implement suggested metrics and assess the usage, meaning and impact. Resourcing requirements and contributions will be determined as part of the creation and scoping of any projects. Such projects will collaborate closely with related projects and initiatives identified by the interest group (for example the NISO group on metrics for alternative output types: NISO Alternative Assessment Metrics (Altmetrics) Initiative).

Background

A growing interest in enhanced metric reflecting data and software publications is evident and reflected in a number of initiatives by for example PLOS/CDL. The objective is to enable all stakeholders to better recognise the contribution and impact of the production of good quality research data.

The production of high-quality research data underpins most scientific research. There has been much progress over the past few years in the development of a range of new metrics to try and better recognise the impact of research articles (especially the development of article-level metrics (ALMs) and altmetrics).

During this time, there has also been an increased recognition by governments, funders, institutions and ultimately researchers as to the value and importance of sharing the research data underpinning new research findings, and sharing them in a way that enables others to try to replicate, reanalyse and reuse the data.  However, there has not been much progress to-date in the development of metrics that specifically work well for data. Article-level metrics typically do not work well for datasets where user behaviour in how the data are used are very different, and the range of types of data and the challenges of continually updated data or datasets produced by large consortia are not particularly amenable to existing metrics.

Many researchers are also concerned about sharing their data. Some of the biggest concerns include not being able to extract all the major findings out of the data before someone else gets to benefit (and potentially identify a significant discovery) from the data they have produced. As assessment of researchers’ output for future grants and career progression tends to focus around articles rather than data output, some researchers worry that they will lose much of the benefit of their hard work in producing the data. Additionally, researchers are often concerned about the sometimes extensive amount of time required to get the data (and associated metadata) into a form that enables others can really reanalyse and reuse it.

Both of these concerns could be at least partly alleviated by the development of better tools to measure the impact and importance of the generation of a new dataset, that academic administrators and funding agencies could then use as part of their assessment processes.  This may shift the risk/benefit balance for the researchers. The purpose of this working group is therefore to recommend some metrics that publishers and data repositories can implement and measure, and academic administrators and funding agencies can then use as part of their assessment.

Potential use cases for such metrics include:

  • As a publisher wanting to display metrics on research data published as part of articles so the authors can know the ‘impact’ of the data they have generated, I need some detailed standards that have been agreed to be recognised by research funders and research administrators;
  • As a data repository or community platform wanting to provide data submitters and data users with additional information about how their data has been used and about the impact of particular datasets, I need to know what to show. Which metrics are suitable for cross disciplinary or disciplinary repositories? How do I get them, show them and possibly export them? I need recommendations to address these challenges across disciplines and stakeholders. They should be standardized, but leave flexibility for the reflection of disciplinary practices;
  • As a research funder wanting to evaluate results of the funding we provide, I need standardised recording of metrics that can provide information on the ‘impact’ that a new dataset has had on scientific progress or on the broader needs of society. This can contribute towards evaluation of the output from the research funding we provide, as well as provide valuable information towards our assessment of new grant applications;
  • As a university administrator, I need standardised recording of metrics that can provide information on the ‘impact’ that a new research dataset has had on scientific progress or on the broader needs of society. This can help contribute towards decisions on recruitment of new faculty and promotion, as well as provide information that may be used in institutional reviews that impact institution-level funding.

Participants

  • David Baker, Casrai (Co-chair)
  • Alex Ball, DCC, UKOLN
  • Sarah Callaghan, STFC
  • Manuel Corpas, TGAC
  • Jon Corson-Rikert, Cornell University
  • Sunje Dallmeier-Tiessen, CERN (Co-chair)
  • Victor de la Torre, Spanish National Bioinformatics Institute
  • Kevin Dolby, Wellcome Trust
  • Scott Edmunds, GigaScience, GigaDB, BGI
  • Holly Falk-Krzesinski, Elsevier
  • Alejandra Gonzalez-Beltran, University of Oxford e-Research Centre; BioSharing
  • Mark Hahnel, Figshare
  • Patricia Herterich, CERN
  • Zeina Jamaleddine, Sidra Medical & Research Center, Qatar Foundation
  • Daniel S Katz, University of Chicago & Argonne National Laboratory
  • Amye Kenall, BioMed Central
  • Rebecca Lawrence, F1000 Research Ltd (Co-chair)
  • Jo McEntyre, Europe PMC, EBI
  • Philippe Rocca-Serra, University of Oxford e-Research Centre; BioSharing
  • Susanna Sansome, University of Oxford e-Research Centre
  • Mike Taylor, Elsevier Labs. Elsevier
  • Wendy White, Southampton University

Communications

In addition to this webpage this group has the following communications channels.

Working Groups

The following working groups have been convened from this interest group.