10 Comparison to other studies
In-progress
TODO: Add page comparing our findings to other studies. This could include:
- Studies that have gone through this same process of finding barriers in reproductions
- Studies that recommend how to share code (although, it might be simpler to just start with that first bullet and then decide whether to do both)
Do we also compare the evaluation results? Or do we just use that in context of barriers? (e.g. Schwander et al. (2021) and Zhang, Lhachimi, and Rogowski (2020) for reporting… Laurinavichyute, Yadav, and Vasishth (2022) for code…)
not sure if want to be comparing the actual proportions reproduced… or maybe do, but important to bare in mind what is being compared and what define as success… e.g. some include studies that haven’t shared the code
Starting points:
- Krafczyk et al. (2021) - results are recommendations based on experiences, which were:
- can see completion in figure 1
- being clear on links between article, code and data (e.g. which code uses which data, and which parts of code made each bit of article)
- include scripts for each aspect of article, with it easy to locate the scripts needed, and the scripts including the parameters needed and clearly label
- be clear about hardware needed, e.g. if large amount of computing resources would be required. at least report hardware used. ideally include “small test case that can be run by users with conventional hardware”
- list software dependencies and versions
- use seeds, and report seed you used
- make all code and data available with an appropriate license
- include master script that runs in computations in publication
- use same terminology in code and article
- use version control and specify the e.g. commit hash that identifies the version used
- avoid hard coding parameters
- design scripts in a way that allows people to easily change parameters and run again
- avoid hard coding file paths
- provide script that checks whether users results match original (within expected deviation)
- if compare against competing methods, include info on how those were implemented and tested
- use build system for C/C++ code
- provide scripts to make the figures and tables
- Wood, Müller, and Brown (2018)
- complete data: 27 comparable results, 5 minor differences, 0 major differences
- incomplete data: 10 comparable, 4 minor differences, 1 major differences
- main issue was the code and data not being shared
- Schwander et al. (2021)
- reproduction success for 3 out of 4 models
- facilitators:
- “Model structure and possible state transitions were presented in a state transition diagram”
- “Overview of input parameters was provided in table format”
- hurdles:
- “PSAs were performed” (probablistic sensitivity analysis)
- “Relevant PSA values for PSA result reproduction were provided (type of distribution and either mean and standard deviation or distribution parameters were provided)”
- “Clinical event simulation results were provided (which are very helpful to guide potential assumptions to be made for rebuilding the model and which provide an additional means of testing the fit of the replication)”
- “Relevant details on the underlying life tables were provided (including year of data)”
- “Several self-created regression equations were introduced but without details on how to apply/solve the provided regressions correctly”
- Laurinavichyute, Yadav, and Vasishth (2022)
- Konkol, Kray, and Pfeiffer (2019)
- Hardwicke et al. (2021)
Hardwicke, Tom E., Manuel Bohn, Kyle MacDonald, Emily Hembacher, Michèle B. Nuijten, Benjamin N. Peloquin, Benjamin E. deMayo, Bria Long, Erica J. Yoon, and Michael C. Frank. 2021. “Analytic Reproducibility in Articles Receiving Open Data Badges at the Journal Psychological Science: An Observational Study.” Royal Society Open Science 8 (1): 201494. https://doi.org/10.1098/rsos.201494.
Konkol, Markus, Christian Kray, and Max Pfeiffer. 2019. “Computational Reproducibility in Geoscientific Papers: Insights from a Series of Studies with Geoscientists and a Reproduction Study.” International Journal of Geographical Information Science 33 (2): 408–29. https://doi.org/10.1080/13658816.2018.1508687.
Krafczyk, M. S., A. Shi, A. Bhaskar, D. Marinov, and V. Stodden. 2021. “Learning from Reproducing Computational Results: Introducing Three Principles and the Reproduction Package.” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 379 (2197): 20200069. https://doi.org/10.1098/rsta.2020.0069.
Schwander, Björn, Mark Nuijten, Silvia Evers, and Mickaël Hiligsmann. 2021. “Replication of Published Health Economic Obesity Models: Assessment of Facilitators, Hurdles and Reproduction Success.” Pharmacoeconomics 39 (4): 433–46. https://doi.org/10.1007/s40273-021-01008-7.
Wood, Benjamin D. K., Rui Müller, and Annette N. Brown. 2018. “Push Button Replication: Is Impact Evaluation Evidence for International Development Verifiable?” PLOS ONE 13 (12): e0209416. https://doi.org/10.1371/journal.pone.0209416.
Zhang, Xiange, Stefan K. Lhachimi, and Wolf H. Rogowski. 2020. “Reporting Quality of Discrete Event Simulations in Healthcare—Results From a Generic Reporting Checklist.” Value in Health 23 (4): 506–14. https://doi.org/10.1016/j.jval.2020.01.005.