By Rob Mitchum // September 15, 2014
For all of its technological advances, science remains distinctly old-fashioned in how it assigns credit. For centuries, the order of authors on a publication has conferred some degree of precedence — though the structure varies from field to field — and the importance of a paper is still typically measured by the number of times it is cited by others. As science modernizes, this system misses several aspects of how research is performed and how knowledge spreads, inspiring several attempts to create “alternative metrics” that better describe the impact of a scientific finding or individual. One new proposal from CI Senior Fellow Daniel S. Katz adapts a rule of mathematical logic to make it easier for creators of software and data to receive due credit.
Today, if you write a piece of scientific software or share a useful data set, your contribution is typically recognized deep within the text of a scientific paper. In some cases, a citation will be given to the source of the tool or data, but more often it will be briefly mentioned in the methods or the acknowledgements section, hidden from most common scientific metrics. As such, the person who wrote the code or shared the data in the first place does not receive credit commensurate with their scientific impact, diminishing their accomplishments when applying for grants or employment and disincentivizing future work.
Katz, currently serving as program director in the Division of Advanced Cyberinfrastructure for the National Science Foundation, observed this disconnect and started thinking about better ways to assign credit for these “non-traditional” research contributions.
“Part of the job that I have at NSF is to make sure that we are funding and creating the software needed for NSF-funded scientists and engineers to do their work,” Katz said. “Since we can’t directly pay for all the software that’s needed, how can we try to make sure that software is available even if we’re not paying people to develop it? How do we provide the right incentives to make this happen?”
His idea, “transitive credit,” seeks to extend the chain of scientific credit beyond the one-degree link offered by traditional citations, to acknowledge indirect contributions. In mathematics, transitive properties describe the relationship between two objects that aren’t directly connected; in a simple example, if A is greater than B, and B is greater than C, then one can infer that A is greater than C. In Katz’s proposed system, that concept is adapted to pass back scientific credit; if researchers create software that is used for a research paper, which is then in turn cited by additional papers, some of that credit will roll back to the original software developers.
To make this system work, Katz proposes a more quantitative approach to author listings that goes beyond an opaquely ordered list of names — or even just people. “Complete credit” for a scientific paper or product, he argues, should include not only all of the human contributors, but the research objects that made the work possible, including software, data, and publications. , the creators of a paper or product assign a weight to each “contriponent” (Katz’s portmanteau) that reflects its proportional contribution to the work. So for a new piece of scientific software, the lead developer who did the most work might receive 50% of the credit, two collaborators receive 20% each, and a previous software package critical to the new creation receives 10%.
Putting a number on the contributions of each person and product enables the transitive credit system to work. When a research paper uses the above software, they can give it a portion of the credit, which will transfer back to the original developers (and the developers whose work they themselves built upon). Those pieces of credit can then be aggregated into a new measure of scientific impact more inclusive than current metrics such as citation count and h-index.
“The metrics that we have today don’t really capture what some of the things we think are important,” Katz said. “Right now, we’re effectively downgrading indirect contributions because we don’t measure them, though these are the fundamental elements that make new science possible. Under the transitive system, all those pieces could be added up and credited.”
But how would such a system be implemented? In a second paper, Katz and GitHub’s Arfon Smith suggest that a data format called JSON-LD could be the solution. JSON is a popular format readable by both humans and machines; JSON-LD (the “LD” stands for “linked data”) is a variant that offers additional context, useful for describing objects such as software. In their paper, Katz and Smith propose embedding the contriponents and credit weights for a research paper or product in its metadata using JSON-LD, which could then be machine-read to create a “creditmap” of scholarly works.
Eventually, you might imagine a number for each researcher, developer, dataset, or software that reflects his, her, or its contribution to scientific progress, stimulating and rewarding collaboration. “Providing a credit mechanism would both remove the negative force and create a positive force, creating an incentive for sharing. This would impact recognition and status, hiring and promotion, and funding agency decisions,” Katz writes.
Katz’s proposal will be discussed at the 2nd Working towards Sustainable Software for Science: Practice and Experiences (WSSSPE) workshop in November, and he hopes it will stir further conversation about how to better reward and acknowledge scientific contributions that slip between the cracks in the current system.
“I think the question is do other people think this idea is worth pursuing in a more widespread fashion,” Katz said. “If others agree that this is a problem, and agree that this is a useful way of going towards a solution, we can come up with other alternatives and compare them. Eventually, we can move towards a solution that is widely accepted.”