Case Study A Peer-Reviewed Value Proposition

The Problem is a website where academics can share their research and discover research shared by others. Launched in 2008, the site currently has nearly 30 million users who have uploaded over 8 million documents.

During the past few years, staff were hearing from more and more users who were telling them that they saw a bump in citations after posting their articles to the site. Citations are a key metric for academics—for example, they play a large role in tenure decisions. If posting to the site really was associated with increased citations, it was an important value proposition for the company.

Many companies would be satisfied with anecdotal testimonials like those was hearing. But they wanted to go further. Could they examine their product’s effectiveness using the same principles they championed as a company—with open, rigorous research?

So they came to us with an idea: perform a rigorous statistical analysis to determine whether articles posted to received more citations than similar articles not on the site. Furthermore, they wanted to submit the study to a peer-reviewed journal, and make all of the data and code available for anyone to review. We would provide them with an objective, third-party analysis to see if there really was an effect, using rigorous statistical techniques and solid software design suitable for sharing and replication.

The Process

Polynumeral worked closely with staff over the course of a year to perform the study: helping collect data, design and run analyses, and write up results for public and academic consumption.

  • We designed strategies to control for selection biases and avoid correlation-causation fallacies. To do this, we identified a number of possible alternatives to explain why papers on the site might seem to have more citations. We then came up with data collection and modeling strategies to control for those possibilities.
  • We did an extensive literature review of previous research in the field of citation analysis.
  • We designed, coded, and ran a variety of statistical models on the data, and interpret results
  • We helped design, run, and analyze a large-scale, crowd-source data collection effort. We helped design ways to ensure accuracy through survey design and post-collection statistical audits.
  • We helped write blog posts and an academic paper describing the analysis and results.
  • We helped respond to public comments on the study, as well as comments from academic peer reviewers, ultimately leading to its publication in a top-tier peer-reviewed journal.

The Outcome

Our study found that, after controlling for a variety of factors, papers posted to had about 70% more citations after five years than similar papers. The paper, along with the underlying code and data are publicly-available here.

The study provided with a compelling value proposition for its users—one backed by credible and publicly-available research. invited its then-21 million users to read the study and review the data and analysis. It was also covered by, and has been accepted for publication in PLoS ONE, the premier peer-reviewed open access journal, and one of the 25 most-cited English language academic journals.

In fact, our work is such an important part of the company’s strategy, it’s the first thing users see when they visit the site.

Screen Shot 2015-12-09 at 8.20.27 PM