Communicating with Data

Data has become another genre of modern scholarship with the increasing reliance on research computing, informatics, virtual and team research becoming the new norm in many disciplines. Data in this context takes many forms, from digitized film collections for critical analysis, to digitized text collections for linguistic analysis, to databases that document the careful study of historical artifacts. Such data is often created as a standalone, local resource but later becomes a community resource or even an international reference collection, like the Linguistic Data Consortium’s speech corpus or the Perseus Digital Library in Classics. Unlike print-based research communication, digital data requires additional infrastructure to use – software to process, analyze, visualize, and document it. Research data also has unfamiliar and often confusing intellectual property attributes in that data is often factual in nature and so does not fall under copyright or other intellectual property rights, at least in the U.S. And since international law varies widely on this issue, international research is becoming fraught with potential IP conflicts.

As data becomes a new currency of research, researchers struggle with how to manage, publish, and archive their data, and institutions do not yet understand how to define policies for, support, leverage or sustain data as it does traditional research products like books or articles. And the fact that research funding organizations of all types are beginning to require more sophisticated data management and sharing practices by their researchers just adds to the need for greater understanding of how to fold these data products into our system of scholarly communication and credit. If data is to be a creditable output of scholarship in the research community, scholars and institutions that make large investments in its creation and management need capture the credit for that investment. Developing mechanisms to capture and credit data as a measurable aspect of research productivity and impact, similar to the metrics for publication that are currently used for the promotion and tenure system, and for institutional assessment and ranking, is highly desirable. This is where data-as-publication relates to some of the emerging pathologies for traditional published described earlier.

This research has three major themes across the three years:

Year 1: Data Curation. Aspects of research data management, preservation, integration and visualization impact its potential for scholarly reuse to verify research results or to conduct new research using old data in new ways. We will investigate the range of individual and institutional roles and responsibilities for data management, publication and preservation, integration and visualization across relevant entities (e.g. academic, IT, research, library). We will also investigate desirable infrastructure to support data as a new mode of scholarly communication of equal importance to traditional publications, in support of Year 2.

Year 2: Data Governance. Researchers, universities, funders and the public face myriad issues of data quality, authenticity, reproducibility, ownership, control, access rights, reuse and preservation (see Smith, Report 2011). Belief systems often vary from legal or institutional realities, and where data has economic value and questionable legal protection we lack scholarly norms of behavior and policy. Do researchers “own” their data or do their institutions, funders, students, or the public have equal rights? Should researchers be required to release their data under open terms of access, to insure reproducibility and leverage the public’s investment?

Year 3: Data Publishing. The third year will focus on the technical, social/cultural and legal issues and challenges around the concept of “data papers”, or formal data publications as part of the scholarly reward and recognition system of equal weight to traditional research articles (see Smith, Conference Proceedings, 2011). Working across university and publishing organizations, we will study whether the “publishing paradigm” is the right one for including data in the system of academic credit, or whether novel “altmetrics” are needed to recognize how research is evolving. An international conference will examine these issues and develop new thinking about how data fits into the record of scholarship.

