7 Evaluation of the repository
The code and related research artefacts in the original code repositories were evaluated against:
- The criteria of badges related to reproducibility from various organisations and journals.
- Recommendations from the STARS framework for the sharing of code and associated materials from discrete-event simulation models (Monks, Harper, and Mustafee (2024)).
Between each journal badge, there was often alot of overlap in criteria. Hence, a list of unique criteria was produced. The repositories are evaluated against this criteria, and then depending on which criteria they met, against the badges themselves.
Caveat: Please note that these criteria are based on available information about each badge online. Moreover, we focus only on reproduction of the discrete-event simulation, and not on other aspects of the article. We cannot guarantee that the badges below would have been awarded in practice by these journals.
Consider: What criteria are people struggling to meet from the guidelines?
7.1 Summary
Unique badge criteria:
Badges:
Essential components of STARS framework:
Optional components of STARS framework:
7.2 Journal badges
Key:
- S: Shoaib and Ramamohan (2021) - link to evaluation
- Hu: Huang et al. (2019) - link to evaluation
- L: Lim et al. (2020) - link to evaluation
- K: Kim et al. (2021) - link to evaluation
- A: Anagnostou et al. (2022) - link to evaluation
- J: Johnson et al. (2021) - link to evaluation
- He: Hernandez et al. (2015) - link to evaluation
- W: Wood et al. (2021) - link to evaluation
In this section and below, the criteria for each study are marked as either being fully met (✅), partially met (🟡), not met (❌) or not applicable (N/A).
Unique criteria:
Item | S | Hu | L | K | A | J | He | W | |
---|---|---|---|---|---|---|---|---|---|
Criteria related to how artefacts are shared | |||||||||
Artefacts are archived in a repository that is: (a) public (b) guarantees persistence (c) gives a unique identifier (e.g. DOI) | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | |
Open licence | ❌ | ✅ | ❌ | ✅ | ✅ | ✅ | ❌ | ❌ | |
Criteria related to what artefacts are shared | |||||||||
Complete (all relevant artefacts available) | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | |
Artefacts relevant to paper | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
Criteria related to the structure and documentation of the artefacts | |||||||||
Documents (a) how code is used (b) how it relates to article (c) software, systems, packages and versions | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | |
Documents (a) inventory of artefacts (b) sufficient description for artefacts to be exercised | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | |
Artefacts are carefully documented and well-structured to the extent that reuse and repurposing is facilitated, adhering to norms and standards | ❌ | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ | ❌ | |
README file with step-by-step instructions to run analysis | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ | |
Dependencies (e.g. package versions) stated | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | |
Clear how output of analysis corresponds to article | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | |
Criteria related to running and reproducing results | |||||||||
Scripts can be successfully executed | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
Reproduced results (assuming (a) acceptably similar (b) reasonable time frame (c) only minor troubleshooting) | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |
Badges:
The badges are grouped into three categories:
- “Open objects” badges: These badges relate to research artefacts being made openly available.
- “Object review” badges: These badges relate to the research artefacts being reviewed against criteria of the badge issuer.
- “Reproduced” badges: These badges relate to an independent party regenerating the reuslts of the article using the author objects.
Item | S | Hu | L | K | A | J | He | W | |
---|---|---|---|---|---|---|---|---|---|
“Open objects” badges | |||||||||
ACM “Artifacts Available” • Artefacts are archived in a repository that is: (a) public (b) guarantees persistence (c) gives a unique identifier (e.g. DOI) |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | |
NISO “Open Research Objects (ORO)” • Artefacts are archived in a repository that is: (a) public (b) guarantees persistence (c) gives a unique identifier (e.g. DOI) • Open licence |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | |
NISO “Open Research Objects - All (ORO-A)” • Artefacts are archived in a repository that is: (a) public (b) guarantees persistence (c) gives a unique identifier (e.g. DOI) • Open licence • Complete (all relevant artefacts available) |
❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | |
COS “Open Code” • Artefacts are archived in a repository that is: (a) public (b) guarantees persistence (c) gives a unique identifier (e.g. DOI) • Open licence • Documents (a) how code is used (b) how it relates to article (c) software, systems, packages and versions |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | |
IEEE “Code Available” • Complete (all relevant artefacts available) |
❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | |
“Object review” badges | |||||||||
ACM “Artifacts Evaluated - Functional” • Documents (a) inventory of artefacts (b) sufficient description for artefacts to be exercised • Artefacts relevant to paper • Complete (all relevant artefacts available) • Scripts can be successfully executed |
❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | |
ACM “Artifacts Evaluated - Reusable” • Documents (a) inventory of artefacts (b) sufficient description for artefacts to be exercised • Artefacts relevant to paper • Complete (all relevant artefacts available) • Scripts can be successfully executed • Artefacts are carefully documented and well-structured to the extent that reuse and repurposing is facilitated, adhering to norms and standards |
❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | |
IEEE “Code Reviewed” • Complete (all relevant artefacts available) • Scripts can be successfully executed |
❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | |
“Reproduced” badges | |||||||||
ACM “Results Reproduced” • Reproduced results (assuming (a) acceptably similar (b) reasonable time frame (c) only minor troubleshooting) |
❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | |
NISO “Results Reproduced (ROR-R)” • Reproduced results (assuming (a) acceptably similar (b) reasonable time frame (c) only minor troubleshooting) |
❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | |
IEEE “Code Reproducible” • Reproduced results (assuming (a) acceptably similar (b) reasonable time frame (c) only minor troubleshooting) |
❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | |
Psychological Science “Computational Reproducibility” • Reproduced results (assuming (a) acceptably similar (b) reasonable time frame (c) only minor troubleshooting) • README file with step-by-step instructions to run analysis • Dependencies (e.g. package versions) stated • Clear how output of analysis corresponds to article |
❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
7.3 STARS framework
Key:
- S: Shoaib and Ramamohan (2021) - link to evaluation
- Hu: Huang et al. (2019) - link to evaluation
- L: Lim et al. (2020) - link to evaluation
- K: Kim et al. (2021) - link to evaluation
- A: Anagnostou et al. (2022) - link to evaluation
- J: Johnson et al. (2021) - link to evaluation
- He: Hernandez et al. (2015) - link to evaluation
- W: Wood et al. (2021) - link to evaluation
Item | S | Hu | L | K | A | J | He | W | |
---|---|---|---|---|---|---|---|---|---|
Essential components | |||||||||
Open licence Free and open-source software (FOSS) licence (e.g. MIT, GNU Public Licence (GPL)) |
❌ | ✅ | ❌ | ✅ | ✅ | ✅ | ❌ | ❌ | |
Dependency management Specify software libraries, version numbers and sources (e.g. dependency management tools like virtualenv, conda, poetry) |
❌ | ❌ | ❌ | 🟡 | ✅ | 🟡 | ❌ | ❌ | |
FOSS model Coded in FOSS language (e.g. R, Julia, Python) |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
Minimum documentation Minimal instructions (e.g. in README) that overview (a) what model does, (b) how to install and run model to obtain results, and (c) how to vary parameters to run new experiments |
❌ | ❌ | ❌ | ✅ | ✅ | 🟡 | ❌ | ❌ | |
ORCID ORCID for each study author |
❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | |
Citation information Instructions on how to cite the research artefact (e.g. CITATION.cff file) |
❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ | |
Remote code repository Code available in a remote code repository (e.g. GitHub, GitLab, BitBucket) |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
Open science archive Code stored in an open science archive with FORCE11 compliant citation and guaranteed persistance of digital artefacts (e.g. Figshare, Zenodo, the Open Science Framework (OSF), and the Computational Modeling in the Social and Ecological Sciences Network (CoMSES Net)) |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | |
Optional components | |||||||||
Enhanced documentation Open and high quality documentation on how the model is implemented and works (e.g. via notebooks and markdown files, brought together using software like Quarto and Jupyter Book). Suggested content includes: • Plain english summary of project and model • Clarifying licence • Citation instructions • Contribution instructions • Model installation instructions • Structured code walk through of model • Documentation of modelling cycle using TRACE • Annotated simulation reporting guidelines • Clear description of model validation including its intended purpose |
❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | |
Documentation hosting Host documentation (e.g. with GitHub pages, GitLab pages, BitBucket Cloud, Quarto Pub) |
❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | |
Online coding environment Provide an online environment where users can run and change code (e.g. BinderHub, Google Colaboratory, Deepnote) |
❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | |
Model interface Provide web application interface to the model so it is accessible to less technical simulation users |
❌ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | |
Web app hosting Host web app online (e.g. Streamlit Community Cloud, ShinyApps hosting) |
❌ | ✅ | ❌ | ❌ | 🟡 | ❌ | ❌ | ❌ |
7.4 Timings
- Shoaib and Ramamohan (2021) - 30m
- Huang et al. (2019) - 17m
- Lim et al. (2020) - 18m
- Kim et al. (2021) - 18m
- Anagnostou et al. (2022) - 19m
- Johnson et al. (2021) - 20m
- Hernandez et al. (2015) - 13m
- Wood et al. (2021) - 14m
Revisiting and redoing evaluation for badges was only 2-3 minutes per study, and have just stuck with the original evaluation timings above.
7.5 Badge sources
National Information Standards Organisation (NISO) (NISO Reproducibility Badging and Definitions Working Group (2021))
- “Open Research Objects (ORO)”
- “Open Research Objects - All (ORO-A)”
- “Results Reproduced (ROR-R)”
Association for Computing Machinery (ACM) (Association for Computing Machinery (ACM) (2020))
- “Artifacts Available”
- “Artifacts Evaluated - Functional”
- “Artifacts Evaluated - Resuable”
- “Results Reproduced”
Center for Open Science (COS) (Blohowiak et al. (2023))
- “Open Code”
Institute of Electrical and Electronics Engineers (IEEE) (Institute of Electrical and Electronics Engineers (IEEE) (2024))
- “Code Available”
- “Code Reviewed”
- “Code Reproducible”
Psychological Science (Hardwicke and Vazire (2024) and Association for Psychological Science (APS) (2024))
- “Computational Reproducibility”