6 Reflections from reproductions
This page describes the facilitators and barriers encountered in each reproduction, presented as a series of recommendations. These are grouped into two themes:
- Recommendations to support reproduction
- Recommendations to facilitate troubleshooting
With each section, I have created a table which evaluates whether the facilitators were fully met (✅), partially met (🟡), not met (❌) or not applicable (N/A) for each study.
Links to each study:
- Shoaib and Ramamohan (2021) - link to reflections
- Huang et al. (2019) - link to reflections
- Lim et al. (2020) - link to reflections
- Kim et al. (2021) - link to reflections
- Anagnostou et al. (2022) - link to reflections
- Johnson et al. (2021) - link to reflections
- Hernandez et al. (2015) - link to reflections
- Wood et al. (2021) - link to reflections
6.1 Summary
This is a summary of all the items that were evaluated (with ✅🟡❌) below.
It excludes the “Other section”. It also excludes a few criteria which weren’t felt particularly relevant/suitable to mark as met or not. These were “optimise model run time” and “avoid large file sizes”.
6.2 Recommendations to support reproduction.
6.2.1 Set-up
Whether or not they did it:
Shoaib and Ramamohan (2021) | Huang et al. (2019) | Lim et al. (2020) | Kim et al. (2021) | Anagnostou et al. (2022) | Johnson et al. (2021) | Hernandez et al. (2015) | Wood et al. (2021) |
---|---|---|---|---|---|---|---|
❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
Then, considering how “applicable” it feels… although you could argue those where I have marked “non-applicable” would become applicable with the addition of any new commits to the repository.
Shoaib and Ramamohan (2021) | Huang et al. (2019) | Lim et al. (2020) | Kim et al. (2021) | Anagnostou et al. (2022) | Johnson et al. (2021) | Hernandez et al. (2015) | Wood et al. (2021) |
---|---|---|---|---|---|---|---|
N/A | N/A | N/A | ❌ | N/A | ❌ | N/A | N/A |
Shoaib and Ramamohan (2021), Hernandez et al. (2015): N/A. Not met but not an issue as last commit (with exception of any related to licence from our communication) is prior to publication, so marked as not applicable.
Huang et al. (2019), Lim et al. (2020), Anagnostou et al. (2022), Wood et al. (2021): N/A. Only 1 commit (excluding repo creation), or only commits on one day.
Kim et al. (2021): Not met. Had commits to their GitHub repository after the publication date - though these largely appear related to a different model in the same repository, or for a fix.
Johnson et al. (2021): Not met. Had commits to their GitHub repository after the publication date. It wasn’t clear which version aligned with the publication. However, the most recent commits add clear README instructions to the repository. We decided to use the latest version of the repository, but it would have beneficial to have releases/versions/a change log that would help to outline the commit history in relation to the publication and any subsequent changes.
Packages and versions
Shoaib and Ramamohan (2021) | Huang et al. (2019) | Lim et al. (2020) | Kim et al. (2021) | Anagnostou et al. (2022) | Johnson et al. (2021) | Hernandez et al. (2015) | Wood et al. (2021) |
---|---|---|---|---|---|---|---|
❌ | ❌ | 🟡 | 🟡 | 🟡 | 🟡 | 🟡 | 🟡 |
Packages
Shoaib and Ramamohan (2021) | Huang et al. (2019) | Lim et al. (2020) | Kim et al. (2021) | Anagnostou et al. (2022) | Johnson et al. (2021) | Hernandez et al. (2015) | Wood et al. (2021) |
---|---|---|---|---|---|---|---|
❌ | ❌ | 🟡 | ✅ | ✅ | 🟡 | 🟡 | ❌ |
Shoaib and Ramamohan (2021): Not met. This became a time-consuming issue as it took a while to identify a dependency that was needed for the code to work (greenlet
) (based on reading the documentation for salabim
), and a while longer to realise I had installed another package when the package I needed was in base (statistics
).
Huang et al. (2019): Not met. However, this was fairly easily resolved based on imports to .R
script, and then on extra imports suggested by RStudio when I tried and failed to run the script.
Lim et al. (2020): Partially met. The only packages needed (numpy
and pandas
) are mentioned in the paper (although only listed as imports in the script).
Kim et al. (2021): Fully met. Provides commands to install packages required at the start of scripts, which I could then easily base renv on automatically (as it detects them).
Anagnostou et al. (2022): Fully met. Provides requirements.txt
Johnson et al. (2021): Partially met. DESCRIPTION
file accompanying epicR
package contained some but not all dependencies.
Hernandez et al. (2015): Partially met. Some (but not all) of the required packages were listed in the paper. Of particular note, this depended on having a local package myutils/
, which was another GitHub repository from the author. This was not mentioned anywhere, and so required to notice this was needed.
Wood et al. (2021): Not met. However, easily resolved based on imports to .R
script.
Reflections:
- The
import
statements can be sufficient in indicating all the packages required but this is not always the case if there are “hidden”/unmentioned dependencies that don’t get imported- Tom: Given that import statements are not always enough, then I would argue more is needed.
- Tom: Useful to know that
renv
“detects” dependencies listed as imports in scripts.
- There are various options for listing the packages (e.g. comprehensive import statements, installation lines in the script, environment files, package DESCRIPTION file).
- Ideally mention this in repository, not just the paper.
- If there are local dependencies (e.g. other GitHub repositories), make sure to (a) mention and link to these repositories, so it is clear they are also required, and (b) include licence/s in those repositories also, so they can be used.
- This was a common issue.
Versions
Shoaib and Ramamohan (2021) | Huang et al. (2019) | Lim et al. (2020) | Kim et al. (2021) | Anagnostou et al. (2022) | Johnson et al. (2021) | Hernandez et al. (2015) | Wood et al. (2021) |
---|---|---|---|---|---|---|---|
❌ | ❌ | 🟡 | 🟡 | 🟡 | 🟡 | 🟡 | 🟡 |
Shoaib and Ramamohan (2021): Not met. I had to backdate the package versions as the model didn’t work with the latest.
Huang et al. (2019): Not met. I initially tried to create an environment with R and package versions that were prior to the article publication date. However, I had great difficulties implementing this with R, and never managed to successfully do this. This was related to:
- The difficulty of switching between R versions
- Problems in finding available/a source for specific package versions for specific versions or R
Lim et al. (2020): Partially met. Provides major Python version, but chose minor and the package versions based on article publication date.
Kim et al. (2021): Partially met. States version of R but not package. Due to prior issues with backdating R, used latest versions. There were no issues using the latest versions of R and packages, but if there had been, it would be important to know what versions had previously been used and worked.
Anagnostou et al. (2022): Partially met (depending on how strict you are being). The Python version was stated in the paper, and the SimPy version was stated in the complementary app repository (although neither were mentioned in the model repository itself). Requirements file just contains one thing - simpy
- with no version.
Johnson et al. (2021): Partially met. R version given in paper. DESCRIPTION
file contains minimum versions for some but not all packages.
Hernandez et al. (2015): Partially. Versions of Python, R and some (but not all) packages given in the paper. Some versions weren’t very specific (e.g. Python 2.7 v.s. something specific like 2.7.12)
Wood et al. (2021): Partially met. States version of R but not package. Due to prior issues with backdating R, used latest versions. There were no issues using the latest versions of R and packages, but if there had been, it would be important to know what versions had previously been used and worked.
Reflections:
- Models will sometimes work with the latest versions of packages, but likewise, you will sometimes need to backdate as it no longer works with the latest
- For Python, it was very easy to “backdate” the python and package versions. However, I found this very difficult to in R, and ended up always using the latest versions.
- Versions are sometimes provided elsewhere (e.g. in paper, in other repositories), but would be handy to be in model repository itself.
- Handy to provide specific versions too, particularly when there can be reasonably large changes between minor versions.
- This was a very common issue.
- Tom: For R, we are sort of moving towards a pre-built container for an R reproducible pipeline
- Response: Though being aware that it is possible to successfully reproduce without backdating - didn’t run into issues with it for the R models - though that doesn’t mean you wouldn’t - but it is a sort of “characteristic” of R, that it is supposed to be less changeable than Python in this regard. Though obviously no guarantees. And not a fair comparison, as I didn’t try to run the Python ones without backdating.
6.2.2 Running the model
Shoaib and Ramamohan (2021) | Huang et al. (2019) | Lim et al. (2020) | Kim et al. (2021) | Anagnostou et al. (2022) | Johnson et al. (2021) | Hernandez et al. (2015) | Wood et al. (2021) |
---|---|---|---|---|---|---|---|
❌ | ❌ | ❌ | ❌ | N/A | 🟡 | ❌ | ✅ |
Shoaib and Ramamohan (2021): Not met. There were several instances where it took quite a while to understand how and where to modify the code in order to run scenarios (e.g. no arrivals, transferring admin work, reducing doctor intervention in deliveries).
Huang et al. (2019): Not met. Set up a notebook to programmatically run the model scenarios. It took alot of work to modify and write code that could run the scenarios, and I often made mistakes in my interpretation for the implementation of scenarios, which could be avoided if code for those scenarios was provided.
Lim et al. (2020): Not met. Several parameters or scenarios were not incorporated in the code, and had to be added (e.g. with conditional logic to skip or change code run, removing hard-coding, adding parameters to existing).
Kim et al. (2021): Not met. Took alot of work to change model from for loop to function, to set all parameters as inputs (some were hard coded), and add conditional logic of scenarios when required.
Anagnostou et al. (2022): Not applicable. No scenarios.
Johnson et al. (2021): Partially met. Has all base case scenarios, but not sensitivity analysis.
Hernandez et al. (2015): Not met. Took a while to figure out how to implement scenarios.
Wood et al. (2021): Fully met.
Reflections:
- Common issue
- Time consuming and tricky to resolve
- Tom: This is a headline. Also, links to importance of reproducible analytical pipelines (RAP) for simulation.
Shoaib and Ramamohan (2021) | Huang et al. (2019) | Lim et al. (2020) | Kim et al. (2021) | Anagnostou et al. (2022) | Johnson et al. (2021) | Hernandez et al. (2015) | Wood et al. (2021) |
---|---|---|---|---|---|---|---|
🟡 | ❌ | 🟡 | ✅ | ✅ | ✅ | ❌ | ✅ |
Shoaib and Ramamohan (2021): Partially met. Script is set with parameters for base configuration 1, with the exception of number of replications.
Huang et al. (2019): Not met. The baseline model in the script did not match the baseline model (or any scenario) in the paper, so had to modify parameters.
Lim et al. (2020): Partially met. The included parameters were corrected, but the baseline scenario included varying staff strength to 2, and the provided code only varied 4 and 6. I had to add some code that enabled it to run with staff strength 2 (as there were an error that occured if you tried to set that).
Kim et al. (2021): Fully met.
Anagnostou et al. (2022): Fully met.
Johnson et al. (2021): Fully met. Base case parameters all correct.
Hernandez et al. (2015): Not met. As agreed with the author, this is likely the primary reason for the discrepancy in these results - they are very close, and we see similar patterns, but not reproduced. Unfortunately, several parameters were wrong, and although we changed those we spotted, we anticipate there could be others we hadn’t spotted that might explain the remaining discrepancies.
Wood et al. (2021): Fully met.
Reflections:
- At least provide a script that can run the baseline model as in the paper (even if not providing the scenarios)
- This can introduce difficulties - when some parameters are wrong, you rely on the paper to check which parameters are correct or not, but if the paper doesn’t mention every single parameter (which is reasonably likely, as this includes those not varied by scenarios), then you aren’t able to be sure that the model you are running is correct.
- This can make a really big difference, and be likely cause of managing to reproduce everything v.s. nothing, if it impacts all aspects of the results.
- Tom: I think this comes back to minimum verification as well. I think the “at least for one scenario” idea of yours is excellent.
Shoaib and Ramamohan (2021) | Huang et al. (2019) | Lim et al. (2020) | Kim et al. (2021) | Anagnostou et al. (2022) | Johnson et al. (2021) | Hernandez et al. (2015) | Wood et al. (2021) |
---|---|---|---|---|---|---|---|
❌ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ |
Shoaib and Ramamohan (2021): Not met. The lack of seeds wasn’t actually a barrier to the reproduction though due to the replication number. I later add seeds so my results could be reproduced, and found that the ease of setting seeds with salabim was a greater facilitator to the work. I only had to change one or two lines of code to then get consistent results between runs (unlike other simulation software like SimPy where you have to consider the use of seeds by different sampling functions). Moreover, by default, salabim would have set a seed (although overridden by original authors to enable them to run replications).
Huang et al. (2019): Not met. It would have been beneficial to include seeds, as there was a fair amount of variability, so with seeds I could then I could be sure that my results do not differ from the original simply due to randomness.
Lim et al. (2020): Not met. The results obtained looked very similar to the original article, with minimal differences that I felt to be within the expected variation from the model stochasticity. However, if seeds had been present, we would have been able to say with certainty. I did not feel I needed to add seeds during the reproduction to get the same results.
Kim et al. (2021): Fully met. Included a seed, although I don’t get identical results as I had to reduce number of people in simulation.
Anagnostou et al. (2022): Fully met. The authors included a random seed so the results I got were identical to the original (so no need for any subjectivity in deciding whether its similar enough, as I could perfectly reproduce).
Johnson et al. (2021): Fully met. At start of script, authors set.seed(333)
.
Hernandez et al. (2015): Fully met. This ensured consistent results between runs of the script, which was really helpful.
Wood et al. (2021): Fully met. Sets seed based on replication number.
Reflections:
- Depending on your model and the outputs/type of output you are looking at, the lack of seeds can have varying impacts on the appearance of your results, and can make the subjective judgement of whether results are consistent harder (if discrepancies could be attributed to not having consistent seeds or not).
- It can be really quite simple to include seeds.
- Over half of the studies did include seed control in their code.
- Tom: There seems little argument against doing this. worth noting that commerical software does this for you and possibly explains why authors didn’t do this themselves if that was their background (lack of knowledge?).
- Tom: Note simpy is independent of any sampling mechanism. We could just use python’s random module and set a single seed if needed (although you lose CRN) and we can setup our models so that we only need to set a single seed.
- Tom: A key part of STARS 2.0 for reproducibility
6.2.3 Outputs
Shoaib and Ramamohan (2021) | Huang et al. (2019) | Lim et al. (2020) | Kim et al. (2021) | Anagnostou et al. (2022) | Johnson et al. (2021) | Hernandez et al. (2015) | Wood et al. (2021) |
---|---|---|---|---|---|---|---|
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ✅ | ✅ |
Shoaib and Ramamohan (2021): Not met. Had to add some outputs and calculations (e.g. proportion of childbirth cases referred, standard deviation)
Huang et al. (2019): Not met. It has a complicated output (standardised density of patient in queue) that I was never certain on whether I correctly calculated. Although it outputs the columns required to calculate it, due its complexity, I feel this was not met, as it feels like a whole new output in its own right (and not just something simple like a mean).
Lim et al. (2020): Not met. The model script provided was only set up to provide results from days 7, 14 and 21. The figures require daily results, so I needed to modify the code to output that.
Kim et al. (2021): Not met. Had to write code to find aorta sizes of people with AAA-related deaths.
Anagnostou et al. (2022): Fully met. Although worth noting this only had one scenario/version of model and one output to reproduce.
Johnson et al. (2021): Note met. It has an output that is in “per 1000” and, although it outputs the columns required to calculate this, I found it very tricky to work out which columns to use and how to transform them to get this output, and so feel this is not met (as feels like a seperate output, and not something simple like a mean, and as it felt so tricky to work out).
Hernandez et al. (2015): Fully met.
Wood et al. (2021): Fully met.
Reflections:
- Calculate and provide all the outputs required
- Appreicate this can be a bit “ambiguous” (e.g. if its just plotting a mean or simple calculation, then didn’t consider that here) (however, combined with other criteria, we do want them to provide code to calculate outputs, so we would want them to provide that anyway)
- Tom: This is a headline. I suspect we can find supporting citations elsewhere from other fields. Its a reporting guideline thing too, but in natural language things can get very ambiguous still! Would be good to make that point as well I think.
Summary
Shoaib and Ramamohan (2021) | Huang et al. (2019) | Lim et al. (2020) | Kim et al. (2021) | Anagnostou et al. (2022) | Johnson et al. (2021) | Hernandez et al. (2015) | Wood et al. (2021) |
---|---|---|---|---|---|---|---|
❌ | ❌ | 🟡 | ❌ | ❌ | 🟡 | 🟡 | ✅ |
Provide code to process results into tables
Shoaib and Ramamohan (2021) | Huang et al. (2019) | Lim et al. (2020) | Kim et al. (2021) | Anagnostou et al. (2022) | Johnson et al. (2021) | Hernandez et al. (2015) | Wood et al. (2021) |
---|---|---|---|---|---|---|---|
❌ | N/A | 🟡 | ❌ | N/A | ❌ | ❌ | ✅ |
Shoaib and Ramamohan (2021): Not met.
Huang et al. (2019): Not applicable. No tables in scope.
Lim et al. (2020): Partially met. It outputs the results in a similar structure to the paper (like a section of a table). However, it doesn’t have the full code to produce a table outright, for any of the tables, so additional processing still required.
Kim et al. (2021): Not met. Had to write code to generate tables, which included correctly implementing calculation of excess e.g. deaths, scaling to population size, and identify which columns provide the operation outcomes.
Anagnostou et al. (2022): Not applicable. No tables in scope.
Johnson et al. (2021): Not met. Had to write code to generate tables, which took me a while as I got confused over thinks like which tables / columns / scenarios to use.
Hernandez et al. (2015): Not met.
Wood et al. (2021): Fully met.
Reflections:
- It can take a bit of time to do this processing, and it can be tricky/confusing to do correctly, so very handy for it to be provided.
- Common issue.
Provide code to process results into figures
Shoaib and Ramamohan (2021) | Huang et al. (2019) | Lim et al. (2020) | Kim et al. (2021) | Anagnostou et al. (2022) | Johnson et al. (2021) | Hernandez et al. (2015) | Wood et al. (2021) |
---|---|---|---|---|---|---|---|
❌ | ❌ | ❌ | ❌ | ❌ | 🟡 | 🟡 | ✅ |
Shoaib and Ramamohan (2021): Not met.
Huang et al. (2019): Not met. Had to write code from scratch. For one of the figures, it would have been handy if informed that plot was produced by a simmer function (as didn’t initially realise this). It also took a bit of time for me to work out how to transform the figure axes as this was not mentioned in the paper (and no code was provided for these). It was also unclear and a bit tricky to work out how to standardise the density in the figures (since it is only described in the text and no formula/calculations are provided there or in the code).
Lim et al. (2020), Kim et al. (2021) and Anagnostou et al. (2022): Not met. However, the simplicity and repetition of the figures was handy.
Johnson et al. (2021): Partially met. For Figure 3, most of the required code for the figure was provided, which was super helpful. However, this wasn’t complete, and for all others figures, I had to start from scratch writing the code.
Hernandez et al. (2015): Partially met. Provides a few example ggplot
s, but these are not all the plots, nor exactly matching article, nor including any of the pre-processing required before the plots, and so could only serve as a starting point (though that was still really handy).
Wood et al. (2021): Fully met. Figures match article, with one minor exception that I had to add smoothing to the lines on one of the figures.
Reflections:
- It can take a bit of time to do this processing, particularly if the figure involves any transformations (and less so if the figure is simple), so very handy for it to be provided.
- Also, handy if the full code can be provided for all figures (although partial code is more helpful than none at all).
- Common issue.
Provide code to calculate in-text results
By “in-text results”, I am referred to results that are mentioned in the text but not included in/cannot be deduced from any of the tables or figures.
Shoaib and Ramamohan (2021) | Huang et al. (2019) | Lim et al. (2020) | Kim et al. (2021) | Anagnostou et al. (2022) | Johnson et al. (2021) | Hernandez et al. (2015) | Wood et al. (2021) |
---|---|---|---|---|---|---|---|
❌ | ❌ | N/A | ❌ | N/A | N/A | N/A | N/A |
Shoaib and Ramamohan (2021), Huang et al. (2019), Kim et al. (2021): Not met.
Lim et al. (2020), Anagnostou et al. (2022), Johnson et al. (2021), Hernandez et al. (2015), Wood et al. (2021): Not applicable (no in-text results).
Reflections:
- Provide code to calculate in-text results
- Universal issue, for those with in-text results not otherwise captured in tables and figures
6.3 Recommendations to support troubleshooting and reuse
6.3.1 Design
Shoaib and Ramamohan (2021) | Huang et al. (2019) | Lim et al. (2020) | Kim et al. (2021) | Anagnostou et al. (2022) | Johnson et al. (2021) | Hernandez et al. (2015) | Wood et al. (2021) |
---|---|---|---|---|---|---|---|
N/A | ❌ | N/A | N/A | ✅ | N/A | N/A | N/A |
For studies where this is relevant (i.e. with applications):
- Huang et al. (2019): Not met. The model code was provided within the code for a web application, but the paper was not focused on this application, and instead on specific model scenarios. I had to extract the model code and transform it into a format that was “runnable” as an R script/notebook.
- Anagnostou (2022): Fully met. Code for web app in a seperate repository to model code.
For other studies, regarding whether the model code was in a “runnable” format and not embedded within anything else…
- Shoaib and Ramamohan (2021): Provided as a single
.py
file which ran model with functionmain()
. - Lim et al. (2020): Provided as a single
.py
file which ran the model with a for loop. - Kim et al. (2021): Has seperate
.R
scripts for each scenario which ran the model by calling functions from elsewhere in repository. - Johnson et al. (2021): Model provided as a package (which is an R interface for the C++ model).
- Hernandez et al. (2015): The model (python code) can be run from
main.py
. - Wood et al. (2021): Provided as a single
.R
file which ran the model with a for loop.
Reflections:
- If you are presenting the results of a model, then provide the code for that model in a “runnable” format.
- This was an uncommon issue.
Don’t hard code parameters that you will want to change for scenario analyses
Shoaib and Ramamohan (2021) | Huang et al. (2019) | Lim et al. (2020) | Kim et al. (2021) | Anagnostou et al. (2022) | Johnson et al. (2021) | Hernandez et al. (2015) | Wood et al. (2021) |
---|---|---|---|---|---|---|---|
❌ | 🟡 | ❌ | ✅ | N/A | ✅ | 🟡 | ✅ |
Shoaib and Ramamohan (2021): Not met. Although some parameters sat “outside” of the model within the main()
function (and hence were more “changeable”, even if not “changeable” inputs to that function, but changed directly in script). However, many of the other parameters were hard-coded within the model itself. It took time to spot where these were and correctly adjust them to be modifiable inputs.
Huang et al. (2019): Partially met. Pretty much all of the parameters that we wanted to change were not hard coded and were instead inputs to the model function simulate_nav()
. However, I did need to add an exclusive_use
scenario which conditionally changed ir_resources
, but that is the only exception. I also add ed_triage
as a changeable input but didn’t end up needing that to reproduce any results (was just part of troubleshooting). I also
Lim et al. (2020): Not met. Some parameters were not hard coded within the model, but lots of them were not.
Kim et al. (2021): Fully met. All model parameters could be varied from “outside” the model code itself, as they were provided as changeable inputs to the model.
Anagnostou et al. (2022): N/A as no scenarios.
Johnson et al. (2021): Fully met. All model parameters could be varied from “outside” the model code itself, as they were provided as changeable inputs to the model.
Hernandez et al. (2015): Partially met. Did not hard code runs, population, generations, and percent pre-screened. However, did hard code other parameters like bi-objective v.s tri-objective model and bounding. Also, it was pretty tricky to change percent pre-screened, as it assumed you provided a .txt
file for each %.
Wood et al. (2021): Fully met. All model parameters for the scenarios/sensitivity analysis could be varied from “outside” the model code itself.
Reflections:
- It can be quite difficult to change parameters that are hard coded into the model. Ideally, all the parameters that a user might want to change should be easily changeable and not hard coded.
- This is a relatively common issue.
- There is overlap between this and whether the code for scenarios is provided (as typically, the code for scenario is conditionally changing parameter values, although this can be facilitated by not hard coding the parameters, so you call need to change the values from “outside” the model code, rather than making changes to the model functions themselves). Hence, have included as two seperate reflections.
- Important to note that we evaluate this in the context of reproduction - and have not checked for hard-coded parameters outside the specified scenario analyses, but that someone may wish to alter if reusing the model for a different analysis/context/purpose.
Shoaib and Ramamohan (2021) | Huang et al. (2019) | Lim et al. (2020) | Kim et al. (2021) | Anagnostou et al. (2022) | Johnson et al. (2021) | Hernandez et al. (2015) | Wood et al. (2021) |
---|---|---|---|---|---|---|---|
❌ | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ |
Shoaib and Ramamohan (2021): Not met. The model often contained very similar blocks of code before or after warm-up.
Huang et al. (2019): Fully met.
Lim et al. (2020): Fully met.
Kim et al. (2021): Not met. There was alot of duplication when running each scenario (e.g. repeated calls to Eventsandcosts
, and repeatedly defining the same parameters). This meant, if changing a parameter that you want to be consistent between all the scripts (e.g. number of persons), you had to change each of the scripts one by one.
Anagnostou et al. (2022): Fully met.
Johnson et al. (2021): Not met. There was alot of duplication when running each scenario. This meant, when amending these for the sensitivity analysis, I would need to change the same parameter 12 times within the script, and for changes to all, changing it 12 times in 14 duplicate scripts. Hence, it was simpler to write an R script to do this than change it directly, but for base case, I had to make sure I carefully changed everything in both files.
Hernandez et al. (2015): Fully met.
Wood et al. (2021): Fully met.
Reflections: Large amounts of code duplication are non-ideal as they can:
- Make code less readable
- Make it trickier to change universal parameters
- Increase the likelihood of introducing mistakes
- Make it trickier to set up scenarios/sensitivity analyses
6.3.2 Clarity
Shoaib and Ramamohan (2021) | Huang et al. (2019) | Lim et al. (2020) | Kim et al. (2021) | Anagnostou et al. (2022) | Johnson et al. (2021) | Hernandez et al. (2015) | Wood et al. (2021) |
---|---|---|---|---|---|---|---|
❌ | ❌ | ✅ | 🟡 | 🟡 | ❌ | 🟡 | ❌ |
Shoaib and Ramamohan (2021) and Huang et al. (2019): Not met. Would have benefitted from more comments, as it took some time to ensure I have correctly understood code, particularly if they used lots of abbreviations.
Lim et al. (2020): Fully met. There were lots of comments in the code (including doc-string-style comments at the start of functions) that aided understanding of how it worked.
Kim et al. (2021): Partially met. Didn’t have any particular issues in working out the code. There are sufficient comments in the scenario scripts and at the start of the model scripts, although within the model scripts, there were sometimes quite dense sections of code that would likely benefit from some additional comments.
Anagnostou et al. (2022): Partially met. Didn’t have to delve into the code much, so can’t speak from experience as to whether the comments were sufficient. From looking through the model code, several scripts have lots of comments and docstrings for each function, but some do not.
Johnson et al. (2021): Not met. Very few comments in the Case_Detection_Results...Rmd
files, which were the code files provided.
Hernandez et al. (2015): Partially met. There are some comments and doc-strings, but not comprehensively.
Wood et al. (2021): Not met. Very few comments, so for the small bit of the code that I did delve into, took a bit of working out what different variables referred to.
Reflections:
- With increasing code complexity, the inclusion of sufficient comments becomes increasingly important, as it can otherwise be quite time consuming to figure out how to fix and change sections of code
- Define abbreviations used within the code
- Good to have consistent comments and docstrings throughout (i.e. on all scripts, on not just some of them)
- Common issue
- Tom: I guess this one isn’t strictly necessary for reproducibility. The main issue was that the studies required a fair bit of manual work to get them to reproduce the results from teh mixed issues you listed above. This is sort of a “failsafe option” for reproducibility or perhaps more relevant for reuse/adaptation.
Shoaib and Ramamohan (2021) | Huang et al. (2019) | Lim et al. (2020) | Kim et al. (2021) | Anagnostou et al. (2022) | Johnson et al. (2021) | Hernandez et al. (2015) | Wood et al. (2021) |
---|---|---|---|---|---|---|---|
❌ | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ |
Shoaib and Ramamohan (2021): Not met. There were two alternative results spreadsheets with some duplicate metrics but sometimes differing results between them, which made it a bit confusing to work out what to use.
Huang et al. (2019), Lim et al. (2020), and Anagnostou et al. (2022): Fully met. Didn’t experience issues interpreting the contents of the output table/s.
Kim et al. (2021): Not met. It took me a little while to work out what surgery columns I needed, and to realise I needed to combine two of them. This required looking at what inputs genreated this, and referring to a input data dictionary.
Johnson et al. (2021): Not met. I had mistakes and confusion figuring out which results tables I needed, which columns to use, and which scenarios to use from the tables.
Hernandez et al. (2015): Fully met. Straightforward with key information provided.
Wood et al. (2021): Fully met. I didn’t need to work with the output tables, but from looking at them now, they make sense.
Reflections:
- Don’t provide alternative results for the same metrics
- Make it clear what each colum/category in the results table means, if it might not be immediately clear.
- Make differences between seperate results tables clear.
- Tom: In a RAP for simulation world we have a env + model + script that gets you to the exact results table you see in the paper and this isn’t a problem (although more time consuiming to setup).
Shoaib and Ramamohan (2021) | Huang et al. (2019) | Lim et al. (2020) | Kim et al. (2021) | Anagnostou et al. (2022) | Johnson et al. (2021) | Hernandez et al. (2015) | Wood et al. (2021) |
---|---|---|---|---|---|---|---|
❌ | ❌ | ❌ | 🟡 | ✅ | ✅ | ❌ | ❌ |
Shoaib and Ramamohan (2021): Not met. No instructions, although is just a single script that you run.
Huang et al. (2019): Not met. Not provided in runnable form but, regardless, no instructions for running it as it is provided (as a web application - i.e. no info on how to get that running).
Lim et al. (2020): Not met. No instructions, although is just a single script that you run.
Kim et al. (2021): Partially met. README tells you which folder has the scripts you need, although nothing further. Although all you need to do is run them.
Anagnostou et al. (2022): Fully met. Clear README with instructions on how to run the model was really helpful.
Johnson et al. (2021): Fully met. README has mini description of model and clear instructions on how to install and run the model.
Hernandez et al. (2015): Not met.
Wood et al. (2021): Not met. No instructions, although it was fairly self explanatory (single script master.R
to run, then processing scripts named after items in article e.g. fig7.R
).
Reflections:
- Even if as simple as running a script, include instructions on how to do so
- In simpler projects (e.g. single script), this can be less of a problem.
- Common issue
- Tom: Evidence for STARS essential component of minimum documentation.
Summary (whether stated both)
Shoaib and Ramamohan (2021) | Huang et al. (2019) | Lim et al. (2020) | Kim et al. (2021) | Anagnostou et al. (2022) | Johnson et al. (2021) | Hernandez et al. (2015) | Wood et al. (2021) |
---|---|---|---|---|---|---|---|
🟡 | ❌ | 🟡 | ❌ | ❌ | 🟡 | 🟡 | 🟡 |
State run time
Shoaib and Ramamohan (2021) | Huang et al. (2019) | Lim et al. (2020) | Kim et al. (2021) | Anagnostou et al. (2022) | Johnson et al. (2021) | Hernandez et al. (2015) | Wood et al. (2021) |
---|---|---|---|---|---|---|---|
✅ | ❌ | ❌ | ❌ | ❌ | ✅ | 🟡 | 🟡 |
Shoaib and Ramamohan (2021): Fully met. Run time stated in paper (but not repository).
Huang et al. (2019): Not met.
Lim et al. (2020): Not met.
Kim et al. (2021): Not met. A prior paper describing the model development mentions the run time, but not the current paper or repository, so this is easily missed.
Anagnostou et al. (2022): Not met! Although it only took seconds, you could argue that stating this is still important if there were some error that made it look like the model were running continuously (e.g. stuck in a loop) - and as it helps someone identify that they are able to run it on their machine
Johnson et al. (2021): Fully met. In the README, they state the the run time with 100 million agents is 16 hours, which was very handy to know, as I then just got stuck in running with fewer agents while troubleshooting.
Hernandez et al. (2015): Partially met. Some of the run times are mentioned in the paper, but not all, although this did help indicate that we would anticipate other s scenarios to similarly take hours to run.
Wood et al. (2021): Partially met. In the paper, they state that it takes less than five minutes for each scenario, but this feels like half the picture, given the total run time was 48 hours.
Reflections:
- For long models with no statement, it can take a while to realise that it’s not an error in the code or anything, but actually just a long run time! And hard to know how long to expect, and whether it is without the capacities of your machine and so on.
- Ideally include statement of run time in repository as well as paper.
- Ideally include run time of all components of analysis (e.g. all scenarios).
- Common issue.
- Tom: This supports the inclusion of section 5.4 in the STRESS-DES guidelines
- Response: But think it is also important that this is in the repository itself, and not just the paper.
State machine specification
Shoaib and Ramamohan (2021) | Huang et al. (2019) | Lim et al. (2020) | Kim et al. (2021) | Anagnostou et al. (2022) | Johnson et al. (2021) | Hernandez et al. (2015) | Wood et al. (2021) |
---|---|---|---|---|---|---|---|
❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
Regarding the one that did describe it:
- Lim et al. (2020): Describe in article (“desktop computer (Intel Core i5 3.5 GHz, 8 GB RAM)”)
Computationally expensive models
Regarding whether models were computationally expensive…
- Shoaib and Ramamohan (2021), Huang et al. (2019), and Lim et al. (2020), Anagnostou et al. (2022): No issues
- Kim et al. (2021): Unable to run on my machine (
serial
took too long to run (would have to leave laptop on for many many hours which isn’t feasible), andparallel
was too computationally expensive and crashed the machine (with the original number of people)). This is not mentioned in the repository or paper, but only referred to in a prior publication. Would’ve been handy if it included suggestions like reducing number of people and so on (which is what I had to do to feasibly run it). - Johnson et al. (2021): It becomes more computationally expensive if try to run lots at once in simultaneous terminals. Didn’t try running one on local machine with full parameter due to long run time making it infeasible, but knowing my system specs, it should have been able to if did.
- Hernandez et al. (2015): This had long run times but I don’t know if it was computationally expensive or not - I just know that I didn’t run into any issues (but I didn’t record memory usage, so its possible a lower-specced machine might).
- Wood et al. (2021): Not applicable. As stated in their prior paper, the model is constrained by processing time, not computer memory.
Reflections:
- Some models are so computationally expensive that it may be simply impossible to run it a feasible length of time without a high powered machine.
- Handy to mention memory requirements so someone with lower spec machine can ensure they would be able to run it.
- If a model is computationally expensive, it would be good to provide suggested alternatives that allow it to be run on lower spec machines
- Not a common problem - only relevant to computationally expensive models
- Tom: Agree it makes sense to report this, and is captured in reporting guidelines like STRESS-DES.
6.3.3 Functionality
The run time of models had a big impact on how easy it was to reproduce results as longer run times meant it was tricky (or even impossible) to run in the first place, or tricky to re-run. The studies where I made adjustments were:
- Shoaib and Ramamohan (2021): Add parallel processing and ran fewer replications
- Huang et al. (2019): No changes made.
- Lim et al. (2020): Add parallel processing
- Kim et al. (2021): Reduced number of people in simulation, and switched from serial to the provided parallel option.
- Anagnostou et al. (2022): Model was super quick which made it really easy to run and re-run each time
- Johnson et al. (2021): Experimented with using a fewer number of agents for troubleshooting (although ultimately had to run with full number to reproduce results), and ran the scripts in parallel by opening seperate terminals simultaneously. Note: Long run time also meant it took a longer time to do this reproduction - although we excluded computation time in our timings, it just meant e.g. when I made a mistake in coding of scenario analysis and had to re-run, I had to wait another day or two for that to finish before I could resume.
- Hernandez et al. (2015): Add parallel processing, did not run one of the scenarios (it was very long, and hadn’t managed to reproduce other parts of same figure regardless), and experimented with reducing parameters for evolutionary algorithm (but, in the end, ran with full parameters, though lower were helpful while working through and troubleshooting).
- Wood et al. (2021): No changes made, but unlike other reproduction, didn’t try to run at smaller amounts - just set it to run as-is over the weekend.
In one of the studies, there was a minor error which needed fixing, which we anticipated to likely be present due to long run times meaning the model wasn’t all run in sequence at the end.
Reflections:
- Reduce model run time if possible as it makes it easier to work with, and facilitates doing full re-runs of all scenarios (which can be important with code changes, etc).
- Relatedly, it is good practice to re-run all scripts before finishing up, as then you can spot any errors like the one mentioned for Kim et al. (2021)
- Common issue (to varying degrees - i.e. taking 20 minutes, up to taking several hours or even day/s).
- Tom: Long run times are inevitable for some models, but this does suggest that some extra work to build confidence the model is working is expected is beneficial, like one or a small set of verification scenarios that are quick to run.
Shoaib and Ramamohan (2021) | Huang et al. (2019) | Lim et al. (2020) | Kim et al. (2021) | Anagnostou et al. (2022) | Johnson et al. (2021) | Hernandez et al. (2015) | Wood et al. (2021) |
---|---|---|---|---|---|---|---|
✅ | ❌ | ❌ | ❌ | ✅ | ❌ | ✅ | ✅ |
Shoaib and Ramamohan (2021): Fully met. Outputs to .xlsx
files
Huang et al. (2019), Lim et al. (2020), and Kim et al. (2021): Not met. Outputs to dataframe/s.
Anagnostou et al. (2022): Outputs to OUT_STATS.csv
. Note: Although not needed for the reproduction itself, when I tried to amend the name and location of the csv file output the model for use in tests, this was very tricky to do as it was hard coded into the scripts and I found difficult to amend due to how the model is run and set up.
Johnson et al. (2021): Not met. Outputs to dataframe/s that are not saved as files (although can see within the kept .md
file from knitting).
Hernandez et al. (2015): Fully met. Outputs to .txt
files.
Wood et al. (2021): Fully met. Outputs to .csv
files.
Reflections:
- Common issue
- Particularly important if model run time is even slightly long (even just minutes long, but even more so as becomes many minutes / hours), so don’t always have to re-run it each time to get results
- Set up this in such a way that it is easy to change the name and location of the output file.
Shoaib and Ramamohan (2021) | Huang et al. (2019) | Lim et al. (2020) | Kim et al. (2021) | Anagnostou et al. (2022) | Johnson et al. (2021) | Hernandez et al. (2015) | Wood et al. (2021) |
---|---|---|---|---|---|---|---|
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ |
For Hernandez et al. (2015) the default behaviour of the script was to output lots of files from each round (so you could easily have 90, 100, 200+ files), which were then not used in analysis (as it just depended on an aggregate results file). Although these individual files might be useful during quality control, as a default behaviour of the script, it could easily make the repository quite busy/littered with files.
- Tom reflected that this is more of a general housekeeping issue. He agrees and says its ok to do this, but perhaps they need a run mode that does not produce these verification files
I have not evaluated like as a criteria, as a large file size is not inherently a bad thing, and might be difficult to avoid. However, when files are very large, this can make things trickier, such as with requiring compression and use of GitHub Large File Storage (LFS) for tracking, which has limits on the free tier.
Regarding file sizes in each study:
- Shoaib and Ramamohan (2021): Not relevant (results files <35 kB)
- Huang et al. (2019): Provided code didn’t save results to file. When I saved to file, these were large, so I compressed to
.csv.gz
, which made them small enough that GitHub was still happy (26 MB). - Lim et al. (2020): Provided code didn’t save results to file. When I saved to file, these were small, so not relevant (results files <60 kB)
- Kim et al. (2021): Provided code didn’t save results to file. When I saved to file, these were small, so not relevant (results files <1 kB)
- Anagnostou et al. (2022): Not relevant (results file 34 kB)
- Johnson et al. (2021): Provided code didn’t save results to file. When I saved to
.csv
files, these were small, so not relevant (results files <3.6kB) - Hernandez et al. (2015): Not relevant (aggregate results files <10 kB).
- Wood et al. (2021): Aggregate results files are small (327 kB), but raw results files are very large (2.38 GB), and even when compressed to
.csv.gz
(128 MB) require using of GitHub LFS
Reflections:
- For most studies, this was not relevant, with outputs relatively small.
- I only really found this to be an issue when files exceeded GitHub threshold. GitHub give warning over 50 MB, blocks files over 100 MB, requiring you to use GitHub LFS. Recommends repositories ideally <1 GB and for sure < 5GB. GitHub LFS has limits on storage and bandwith use (1GB of each).
- Reducing file size isn’t the only solution. In cases where you have large files, a good option can be storing it elsewhere and then pulling from there into your workflow - for example, storing in zenodo and fetching using pooch.
- Tom: Approach with zenodo seems sensible. There’s a simple workflow for submitting to a journal with this approach as well. Plus scenarios/scripts can link to raw datafiles on zenodo for comparion.
6.4 Other recommendations
There can also be system dependencies, which will vary between systems, and may not be obvious if researchers already have these installed. We identified these when setting up the docker environments (which act like “fresh installs”):
- Shoaib and Ramamohan (2021), Lim et al. (2020), Anagnostou et al. (2022) - no dependencies
- Huang et al. (2019), Kim et al. (2021), Johnson et al. (2021) and Wood et al. (2021) - libcurl4-openssl-dev, libssl-dev, libxml2-dev, libglpk-dev, libicu-dev - as well as
tk
for Johnson et al. 2021 - Hernandez et al. (2015) - wget, build-essential, libssl-dev, libffi-dev, libbz2-dev, libreadline-dev, libsqlite3-dev, zlib1g-dev, libncurses5-dev, libgdbm-de, libnss3-dev, tk-dev, liblzma-dev, libsqlite3-dev, lzma, ca-certificates, curl, git
Although it would be unreasonable for authors to be aware of and list all system dependencies, given they may not be aware of them, this does show the benefit of creating something like docker in identifying them and making note of them within the docker files.
This issue was specific to (a) R studies, and (b) the study with an unsupported version of Python that required building it from source in the docker file.
Shoaib and Ramamohan (2021) | Huang et al. (2019) | Lim et al. (2020) | Kim et al. (2021) | Anagnostou et al. (2022) | Johnson et al. (2021) | Hernandez et al. (2015) | Wood et al. (2021) |
---|---|---|---|---|---|---|---|
❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Shoaib and Ramamohan (2021): Not met. The model is set up as classes and run using a function. However, it is not designed to allow any variation in inputs. Everything uses default inputs, and it designed in such a way that - if you wish to vary model parameters - you need to directly change these in the script itself.
Huang et al. (2019): Fully met. Model was set up as a function, with many of the required parameters already set as “changeable” inputs to that function.
Lim et al. (2020): Fully met. The model is created from a series of functions and run with a for loop that iterates through different parameters. As such, the model is able to be run programmatically (within that for loop, which varied e.g. staff per shift and so on and re-ran the model).
Kim et al. (2021): Fully met. Each scenario is an R script which states different parameters and then calls functions to run model.
Anagnostou et al. (2022): Fully met. Change inputs in input .csv
files.
Johnson et al. (2021): Fully met. Creates a list of input
which are then used by a run()
function.
Hernandez et al. (2015): Fully met. Model created from classes, which accept some inputs and can run the model.
Wood et al. (2021): Fully met. Changes inputs to run all scenarios from a single .R
file.
Reflections:
- Design model so that you can re-run it with different parameters without needing to make changes to the model code itself.
- This allows you to run multiple versions of the model with the same script.
- It also reduces the likelihood of missing errors (e.g. if miss changing an input parameter somewhere, or input the wrong parameters and don’t realise).
- This was an uncommon issue.
- Note, this just refers to the basic set-up, with items below like hard coding parameters also being very important in this context.
Shoaib and Ramamohan (2021) | Huang et al. (2019) | Lim et al. (2020) | Kim et al. (2021) | Anagnostou et al. (2022) | Johnson et al. (2021) | Hernandez et al. (2015) | Wood et al. (2021) |
---|---|---|---|---|---|---|---|
✅ | N/A | N/A | ✅ | ✅ | N/A | ✅ | ✅ |
Shoaib and Ramamohan (2021): Fully met. Just provides file path, so file is saved into current/run directory.
Huang et al. (2019): Not applicable. All inputs defined within script. Outputs were not saved to file/s.
Lim et al. (2020): Not applicable. All inputs defined within script. Outputs were not saved to file/s.
Kim et al. (2021): Fully met. Uses relative file paths for sourcing model and input parameters (gets current directory, then navigates from there).
Anagnostou et al. (2022): Fully met. Uses relative imports of local code files.
Johnson et al. (2021): Not applicable. All inputs defined within script. Outputs are not specifically saved to a file (just that the .md and image files were automatically saved when the .Rmd file was knit). EpicR is package import.
Hernandez et al. (2015): Fully met. Creates folder in current working directory based on date/time to store results.
Wood et al. (2021): Fully met. Although I then changed things a bit as reorganised repository and prefer not to work with setwd()
, these were set up in such a way that it would be really easy to correct file path, just by setting working directory at start of script.
Reflections:
- This was not an issue for any studies - but included to note this was a “facilitator”, as would have needed to amend if they weren’t (and Tom noted that this is a common problem that he runs into elsewhere).
Shoaib and Ramamohan (2021) | Huang et al. (2019) | Lim et al. (2020) | Kim et al. (2021) | Anagnostou et al. (2022) | Johnson et al. (2021) | Hernandez et al. (2015) | Wood et al. (2021) |
---|---|---|---|---|---|---|---|
❌ | ❌ | ❌ | ✅ | ✅ | ✅ | ❌ | ✅ |
Shoaib and Ramamohan (2021): Some parameters that could not be calculated were not provided - ie. what consultation boundaries to use when mean length of doctor consultation was 2.5 minutes
Huang et al. (2019): Not met. In this case, patient arrivals and resource numbers were listed in the paper, and there were several discprepancies between this and the provided code. However, for many of the model parameters like length of appointment, these were not mentioned in the paper, and so it was not possible to confirm whether or not those were correct. Hence, marked as not met, as the presence of discrepenancies for several other parameters puts these into doubt.
Lim et al. (2020): Not met. For Figure 5, had to guess the value for staff_per_shift
.
Kim et al. (2021): Fully met.
Anagnostou et al. (2022): Fully met.
Johnson et al. (2021): Fully met. Could determine appropriate parameters for sensitivity analysis from figures in article.
Hernandez et al. (2015): Not met. The results have a large impact by the bounding set, but this was not mentioned in the paper or repository, and required me looking at the numbers in results and GitHub commit history to estimate the appropriate bounds to use.
Wood et al. (2021): Fully met.
Reflections:
- Provide all required parameters
- Tom: Evidence to support STRESS-DES 3.3
Shoaib and Ramamohan (2021) | Huang et al. (2019) | Lim et al. (2020) | Kim et al. (2021) | Anagnostou et al. (2022) | Johnson et al. (2021) | Hernandez et al. (2015) | Wood et al. (2021) |
---|---|---|---|---|---|---|---|
❌ | ❌ | 🟡 | N/A | N/A | ✅ | 🟡 | N/A |
Shoaib and Ramamohan (2021): Not met. Although there was a scenario table, this did not include all the parameters I would need to change. It was more challenging to identify parameters that were only described in the body of the article. There were also some discrepancies in parameters between the main text of the article, and the tables and figures. Some scenarios were quite ambiguous/unclear from their description in the text, and I initially misunderstood the required parameters for the scenarios.
Huang et al. (2019): Not met. As described above, paper didn’t adequately describe all parameters.
Lim et al. (2020): Partially met. Nearly all parameters are in the paper table, and others are described in the article. However, didn’t provide information for the staff_per_shift
for Figure 5.
Kim et al. (2021) and Anagnostou et al. (2022): Not applicable. All provided.
Johnson et al. (2021): Fully met. All parameters clearly in the two figures presenting the sensitivity analysis, and didn’t have to look elsewhere beyond that.
Hernandez et al. (2015): Most parameters are relatively easily identified from the text or figure legends (though would be easier if provided in a table or similar). Parameter for bounding was not provided in paper.
Wood et al. (2021): Not applicable. All provided.
Reflections:
- Provide parameters in a table (including for each scenario), as it can be difficult/ambiguous to interpret them from the text, and hard to spot them too.
- Tom: The ambiguity of the natural language for scenarios was an important finding.
- Be sure to mention every parameter that gets changed (e.g. for Lim et al. (2020), as there wasn’t a default
staff_per_shift
across all scenarios, but not stated for the scenario, had to guess it). - Tom: Evidence to support STRESS-DES 3.3
Shoaib and Ramamohan (2021) | Huang et al. (2019) | Lim et al. (2020) | Kim et al. (2021) | Anagnostou et al. (2022) | Johnson et al. (2021) | Hernandez et al. (2015) | Wood et al. (2021) |
---|---|---|---|---|---|---|---|
❌ | ✅ | N/A | N/A | N/A | N/A | N/A | N/A |
Shoaib and Ramamohan (2021): Not met. It was unclear how to estimate inter-arrival time.
Huang et al. (2019): Fully met. The calculations for inter-arrival times were provided in the code, and the inputs to the code were the number of arrivals, as reported in the paper, and so making it easy to compare those parameters and check if numbers were correct or not.
Lim et al. (2020): Not applicable. The parameter not provided is not one that you would calculate.
Kim et al. (2021) and Anagnostou et al. (2022): Not applicable. All provided.
Johnson et al. (2021): Not applicable. No processing of parameters required.
Hernandez et al. (2015): Not applicable.
Wood et al. (2021): Not applicable. All provided.
Reflections:
- If you are going to be mentioning the “pre-processed” values at all, then its important to include the calculation (ideally in the code, as that is the clearest demonstration of exactly what you did)
- Tom: This is a very good point for RAP.
Grid lines. Include tick marks/grid lines on figures, so it is easier to read across and judge whether a result is above or below a certain Y value.
Data dictionaries. Anagnostou et al. (2022): Included data dictionary for input parameters. Although I didn’t need this, this would have been great if I needed to change the input parameters at all.
Unsupported versions. Hernandez et al. (2015): Due to the age of the work:
- Some packages were no longer available on Conda and had to be installed from PyPI
- The version of python was no longer supported which meant:
- Not supported by Jupyter Lab and Jupyter Notebook (so no
.ipynb
files, or Jupyter Lab on Docker) - Not supported by VSCode (so had to use “tricks” to run it, involving using a pre-release version of Python on VSCode)
- Had to create Docker image from scratch (i.e. couldn’t start from e.g.
miniconda3
)
- Not supported by Jupyter Lab and Jupyter Notebook (so no
However, this is a slightly unavoidable problem unless you continue to maintain your code (which is ideal but not always feasible in a research environment). Realistically, if reusing this code for a new purpose, you would upgrade it to supported versions.
Tom: This is interesting - and you wonder if it would still be possible (given the “tricks” I followed) in another 10 years time.
Original results files. Hernandez et al. (2015): Included some original results files, which was invaluable in identifying some of the parameters in the code that needed to be fixed.
Classes. Hernandez et al. (2015): Structured code into classes, which was nice to work with/amend.