Patrick Vandewalle, Jelena Kovacˇevic’, and Martin Vetterli
Full article as well as supplementary materials are available here
In 2009, and in order to assess the state of reproducibility practices in signal processing, we performed a study on all of the 134 papers published in IEEE Transactions on Image Processing in 2004. We asked two or three reviewers per paper to check the reproducibility of a paper using a short list of questions. We split the questions into three main parts: the reproducibility of the 1) algorithm, 2) code, and 3) data.
Each of the questions [except for 3(b)] had to be scored using the values 0, 0.5, 1, and N/A (not applicable). For question 3(b), we considered the size of the data set acceptable if the number of items (typically the number of images) was above four (quite an arbitrary number). The results of this study are summarized in Table below and are well aligned with the smaller-scale previous experiments.
Making research reproducible lowers the entrance barrier to a publication and therefore also increases its potential impact. Readers can get into the research work and use its results much more easily if code and data are also available.
As discussed previously, the first advantage of reproducible research we noticed is a gain in our own efficiency. It is much easier to pick a reproducible work up again at some later point, because everything is well organized and documented. We also received positive feedback from colleagues and students who downloaded the code and were happy about its availability. This allowed and simplified some collaborations and is a source of easily reusable demo material for students and visitors.
Recent studies have shown that papers that are freely available online are cited significantly more often than papers that are not. We also reexamined the results from our reproducibility study described above and related them to the number of citations for each of the papers, as available on Web of Science. In figures below, the score on questions 2(b) and 3(c) (about the online availability of code and data, respectively) is displayed versus the number of citations of the papers. Although a clear one to one relation cannot be derived, we can see that the papers with a high number of citations (the right part of the plots) typically also have a high score on both questions. For the papers with a low number of citations (which are also much larger in number, represented in the left part of the plots), this is most often not the case. Of course, there were exceptions: papers scoring high on these questions with a low number of citations. This is to be expected, as online availability of code and data (in whatever form) is no guarantee of a quality publication.
For brevity, we only plot these two results here (as they are also the most illustrative); the full set of figures is available online. In summary, computational papers that do not have code and data available online have a low chance of being cited.