Computational Reproducibility Assessments: Summary
  1. Results
  2. 8  Evaluation of the article
  • 1  Overview
  • 2  Introduction
  • Methods
    • 3  Protocol
    • 4  Badges
  • Results
    • 5  Reproduction
    • 6  Reflections from reproductions
    • 7  Evaluation of the repository
    • 8  Evaluation of the article
    • 9  Reflections from research compendium and test-run
  • 10  Acknowledgements

Table of contents

  • 8.1 Summary
  • 8.2 STRESS-DES
  • 8.3 DES checklist derived from ISPOR-SDM
  • 8.4 Timings
  • 8.5 Use of reporting guidelines
  • 8.6 Uses a previously reported model
  • 8.7 References
  1. Results
  2. 8  Evaluation of the article

8  Evaluation of the article

This page shares the results from the evaluation of the journal articles against criteria from two discrete-event simulation study reporting guidelines:

  • Monks et al. (2019) - STRESS-DES: Strengthening The Reporting of Empirical Simulation Studies (Discrete-Event Simulation) (Version 1.0).
  • Zhang, Lhachimi, and Rogowski (2020) - The generic reporting checklist for healthcare-related discrete event simulation studies derived from the the International Society for Pharmacoeconomics and Outcomes Research Society for Medical Decision Making (ISPOR-SDM) Modeling Good Research Practices Task Force reports.

Consider: What criteria are people struggling to meet from the guidelines?

8.1 Summary

STRESS-DES:

Reflections

Of the applicable criteria, all studies fully met at least 60% (and many others still partially met) (score_fully).

Looking at the quality score suggested (score), we see no relationship between reporting quality and reproduction sucess. This remains the case when using other scoring systems used in other papers (score_schwander and score_zhang).

With similar conclusions from score and score_fully, I think it’s sensible to present score_fully as that is the simplest interpretation.

study fully partially not na reproduce score score_fully score_schwander score_zhang
0 Shoaib and<br>Ramamohan 2021 (16/17) 17 6 1 0 94.1 83.333333 70.833333 83.333333 70.833333
1 Huang et al. 2019 (3/8) 14 5 4 1 37.5 71.739130 60.869565 68.750000 58.333333
2 Lim et al. 2020 (9/9) 15 3 3 3 100.0 78.571429 71.428571 68.750000 62.500000
3 Kim et al. 2021 (10/10) 15 5 2 2 100.0 79.545455 68.181818 72.916667 62.500000
4 Anagnostou et al. 2022 (1/1) 14 2 5 3 100.0 71.428571 66.666667 62.500000 58.333333
5 Johnson et al. 2021 (4/5) 16 1 2 5 80.0 86.842105 84.210526 68.750000 66.666667
6 Hernandez et al. 2015 (1/8) 18 2 3 1 12.5 82.608696 78.260870 79.166667 75.000000
7 Wood et al. 2021 (5/5) 22 2 0 0 100.0 95.833333 91.666667 95.833333 91.666667
Mean values...
study                    NaN
fully              16.375000
partially           3.250000
not                 2.500000
na                  1.875000
reproduce          78.012500
score              81.237757
score_fully        74.014752
score_schwander    75.000000
score_zhang        68.229167
dtype: float64

DES checklist derived from ISPOR-SDM:

Reflections

The proportion of applicable criteria met was lower for this checklist (score_fully).

Same conclusions as for STRESS.

study fully partially not na reproduce score score_fully score_schwander score_zhang
0 Shoaib and<br>Ramamohan 2021 (16/17) 11 2 2 3 94.1 80.000000 73.333333 66.666667 61.111111
1 Huang et al. 2019 (3/8) 7 2 7 2 37.5 50.000000 43.750000 44.444444 38.888889
2 Lim et al. 2020 (9/9) 12 0 4 2 100.0 75.000000 75.000000 66.666667 66.666667
3 Kim et al. 2021 (10/10) 12 0 5 1 100.0 70.588235 70.588235 66.666667 66.666667
4 Anagnostou et al. 2022 (1/1) 8 3 4 3 100.0 63.333333 53.333333 52.777778 44.444444
5 Johnson et al. 2021 (4/5) 15 0 2 1 80.0 88.235294 88.235294 83.333333 83.333333
6 Hernandez et al. 2015 (1/8) 10 0 7 1 12.5 58.823529 58.823529 55.555556 55.555556
7 Wood et al. 2021 (5/5) 13 0 4 1 100.0 76.470588 76.470588 72.222222 72.222222
Mean values...
study                    NaN
fully              11.000000
partially           0.875000
not                 4.375000
na                  1.750000
reproduce          78.012500
score              70.306373
score_fully        67.441789
score_schwander    63.541667
score_zhang        61.111111
dtype: float64
Table with proportion of applicable reporting criteria that were fully met

This is part of a table used in the journal article:

reproduction stress generic
Kim et al. 2021 (10/10) 100.0% (10/10) 68% 71%
Lim et al. 2020 (9/9) 100.0% (9/9) 71% 75%
Wood et al. 2021 (5/5) 100.0% (5/5) 92% 76%
Anagnostou et al. 2022 (1/1) 100.0% (1/1) 67% 53%
Shoaib and<br>Ramamohan 2021 (16/17) 94.1% (16/17) 71% 73%
Johnson et al. 2021 (4/5) 80.0% (4/5) 84% 88%
Huang et al. 2019 (3/8) 37.5% (3/8) 61% 44%
Hernandez et al. 2015 (1/8) 12.5% (1/8) 78% 59%

8.2 STRESS-DES

Key:

  • S: Shoaib and Ramamohan (2021) - link to evaluation
  • Hu: Huang et al. (2019) - link to evaluation
  • L: Lim et al. (2020) - link to evaluation
  • K: Kim et al. (2021) - link to evaluation
  • A: Anagnostou et al. (2022) - link to evaluation
  • J: Johnson et al. (2021) - link to evaluation
  • He: Hernandez et al. (2015) - link to evaluation
  • W: Wood et al. (2021) - link to evaluation

In this section and below, the criteria for each study are marked as either being fully met (✅), partially met (🟡), not met (❌) or not applicable (N/A).

Item S Hu L K A J He W
Objectives
1.1 Purpose of the model
Explain the background and objectives for the model
✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅
1.2 Model outputs
Define all quantitative performance measures that are reported, using equations where necessary. Specify how and when they are calculated during the model run along with how any measures of error such as confidence intervals are calculated.
🟡 ✅ ✅ ✅ ✅ ✅ ✅ ✅
1.3 Experimentation aims
If the model has been used for experimentation, state the objectives that it was used to investigate.
(A) Scenario based analysis – Provide a name and description for each scenario, providing a rationale for the choice of scenarios and ensure that item 2.3 (below) is completed.
(B) Design of experiments – Provide details of the overall design of the experiments with reference to performance measures and their parameters (provide further details in data below).
(C) Simulation Optimisation – (if appropriate) Provide full details of what is to be optimised, the parameters that were included and the algorithm(s) that was be used. Where possible provide a citation of the algorithm(s).
✅ ✅ ✅ ✅ N/A ✅ ✅ ✅
Logic
2.1 Base model overview diagram
Describe the base model using appropriate diagrams and description. This could include one or more process flow, activity cycle or equivalent diagrams sufficient to describe the model to readers. Avoid complicated diagrams in the main text. The goal is to describe the breadth and depth of the model with respect to the system being studied.
✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅
2.2 Base model logic
Give details of the base model logic. Give additional model logic details sufficient to communicate to the reader how the model works.
✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅
2.3 Scenario logic
Give details of the logical difference between the base case model and scenarios (if any). This could be incorporated as text or where differences are substantial could be incorporated in the same manner as 2.2.
✅ ✅ ✅ ✅ N/A ✅ ✅ ✅
2.4 Algorithms
Provide further detail on any algorithms in the model that (for example) mimic complex or manual processes in the real world (i.e. scheduling of arrivals/ appointments/ operations/ maintenance, operation of a conveyor system, machine breakdowns, etc.). Sufficient detail should be included (or referred to in other published work) for the algorithms to be reproducible. Pseudo-code may be used to describe an algorithm.
✅ 🟡 ✅ 🟡 ✅ ✅ ✅ ✅
2.5.1 Components - entities
Give details of all entities within the simulation including a description of their role in the model and a description of all their attributes.
✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅
2.5.2 Components - activities
Describe the activities that entities engage in within the model. Provide details of entity routing into and out of the activity.
✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅
2.5.3 Components - resources
List all the resources included within the model and which activities make use of them.
✅ ✅ N/A N/A ✅ N/A ✅ ✅
2.5.4 Components - queues
Give details of the assumed queuing discipline used in the model (e.g. First in First Out, Last in First Out, prioritisation, etc.). Where one or more queues have a different discipline from the rest, provide a list of queues, indicating the queuing discipline used for each. If reneging, balking or jockeying occur, etc., provide details of the rules. Detail any delays or capacity constraints on the queues.
✅ ✅ N/A N/A ✅ N/A ✅ ✅
2.5.5 Components - entry/exit points
Give details of the model boundaries i.e. all arrival and exit points of entities. Detail the arrival mechanism (e.g. ‘thinning’ to mimic a non-homogenous Poisson process or balking)
✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅
Data
3.1 Data sources
List and detail all data sources. Sources may include:
• Interviews with stakeholders,
• Samples of routinely collected data,
• Prospectively collected samples for the purpose of the simulation study,
• Public domain data published in either academic or organisational literature. Provide, where possible, the link and DOI to the data or reference to published literature.
All data source descriptions should include details of the sample size, sample date ranges and use within the study.
✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅
3.2 Pre-processing
Provide details of any data manipulation that has taken place before its use in the simulation, e.g. interpolation to account for missing data or the removal of outliers.
✅ N/A N/A ✅ N/A N/A N/A ✅
3.3 Input parameters
List all input variables in the model. Provide a description of their use and include parameter values. For stochastic inputs provide details of any continuous, discrete or empirical distributions used along with all associated parameters. Give details of all time dependent parameters and correlation.
Clearly state:
• Base case data
• Data use in experimentation, where different from the base case.
• Where optimisation or design of experiments has been used, state the range of values that parameters can take.
• Where theoretical distributions are used, state how these were selected and prioritised above other candidate distributions.
🟡 🟡 ✅ 🟡 ✅ ✅ ✅ ✅
3.4 Assumptions
Where data or knowledge of the real system is unavailable what assumptions are included in the model? This might include parameter values, distributions or routing logic within the model.
✅ ❌ ✅ ✅ ❌ ✅ ✅ ✅
Experimentation
4.1 Initialisation
Report if the system modelled is terminating or non-terminating. State if a warm-up period has been used, its length and the analysis method used to select it. For terminating systems state the stopping condition.
State what if any initial model conditions have been included, e.g., pre-loaded queues and activities. Report whether initialisation of these variables is deterministic or stochastic.
🟡 ❌ ❌ 🟡 ❌ ✅ ❌ ✅
4.2 Run length
Detail the run length of the simulation model and time units.
✅ ✅ ✅ ✅ 🟡 ✅ ✅ ✅
4.3 Estimation approach
State the method used to account for the stochasticity: For example, two common methods are multiple replications or batch means. Where multiple replications have been used, state the number of replications and for batch means, indicate the batch length and whether the batch means procedure is standard, spaced or overlapping. For both procedures provide a justification for the methods used and the number of replications/size of batches.
🟡 🟡 🟡 ✅ ✅ N/A ✅ ✅
Implementation
5.1 Software or programming language
State the operating system and version and build number.
State the name, version and build number of commercial or open source DES software that the model is implemented in.
State the name and version of general-purpose programming languages used (e.g. Python 3.5).
Where frameworks and libraries have been used provide all details including version numbers.
🟡 🟡 🟡 🟡 🟡 🟡 🟡 🟡
5.2 Random sampling
State the algorithm used to generate random samples in the software/programming language used e.g. Mersenne Twister.
If common random numbers are used, state how seeds (or random number streams) are distributed among sampling processes.
❌ ❌ ❌ ❌ ❌ N/A ❌ ✅
5.3 Model execution
State the event processing mechanism used e.g. three phase, event, activity, process interaction.
Note that in some commercial software the event processing mechanism may not be published. In these cases authors should adhere to item 5.1 software recommendations.
State all priority rules included if entities/activities compete for resources.
If the model is parallel, distributed and/or use grid or cloud computing, etc., state and preferably reference the technology used. For parallel and distributed simulations the time management algorithms used. If the HLA is used then state the version of the standard, which run-time infrastructure (and version), and any supporting documents (FOMs, etc.)
🟡 ❌ ❌ ❌ ❌ ❌ ❌ ✅
5.4 System specification
State the model run time and specification of hardware used. This is particularly important for large scale models that require substantial computing power. For parallel, distributed and/or use grid or cloud computing, etc. state the details of all systems used in the implementation (processors, network, etc.)
✅ ❌ 🟡 🟡 ❌ ❌ 🟡 🟡
Code access
6.1 Computer model sharing statement
Describe how someone could obtain the model described in the paper, the simulation software and any other associated software (or hardware) needed to reproduce the results. Provide, where possible, the link and DOIs to these.
✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅
Reflections

I find this chart a really helpful visualisation, to see what is commonly being met, things that are often not applicable, and things often not met.

Below, I’ve reflected on the impact of each criteria being fulfilled, on the reproduction. I’ve also identified where I feel I’ve been a bit more lax in my evaluation - indeed, the “subjective” nature of this evaluation is a limitation - although did (a) get consensus on uncertainties and unmet, and (b) revisit alongside other evaluations when writing this section, which helped me identify a few inconsistent decisions between studies within the evaluation, which I then addressed.

1.1 Purpose of the model
Explain the background and objectives for the model.

  • Pretty basic requirement that all meet (unsurprisingly).

1.2 Model outputs
Define all quantitative performance measures that are reported, using equations where necessary. Specify how and when they are calculated during the model run along with how any measures of error such as confidence intervals are calculated.

  • One paper was rated as “partially met” as many of the outcomes used were not defined.
  • For those marked fully met, this was mainly based on them mentioning or being pretty clear about what the outcomes were, but wasn’t super strict - ie. didn’t require equations, or to specify how and when they were calculated.
  • Whilst this is important, the raw/basic outcomes themselves were generally pretty straightforward, and issues with outputs more-so related to:
    • Which output to use from the results table (if unclear names, or similar names)
    • How to apply any required transformations

1.3 Experimentation aims
If the model has been used for experimentation, state the objectives that it was used to investigate.
(A) Scenario based analysis – Provide a name and description for each scenario, providing a rationale for the choice of scenarios and ensure that item 2.3 (below) is completed.
(B) Design of experiments – Provide details of the overall design of the experiments with reference to performance measures and their parameters (provide further details in data below).
(C) Simulation Optimisation – (if appropriate) Provide full details of what is to be optimised, the parameters that were included and the algorithm(s) that was be used. Where possible provide a citation of the algorithm(s).

  • In all seven papers with scenarios, these were described. Didn’t necessarily provide a “name” for each scenario, nor a “rationale” for the choice of scenarios, but did describe them.
  • Although the description of the scenario can feel clear from the paper, when the code was not provided, it could be quick tricky and time-consuming to work out how to appropriately change the code, in order to implement the scenario.

2.1 Base model overview diagram
Describe the base model using appropriate diagrams and description. This could include one or more process flow, activity cycle or equivalent diagrams sufficient to describe the model to readers. Avoid complicated diagrams in the main text. The goal is to describe the breadth and depth of the model with respect to the system being studied.

  • All papers included a diagram, which was great.

2.2 Base model logic
Give details of the base model logic. Give additional model logic details sufficient to communicate to the reader how the model works.

  • Pretty basic requirement that all meet (unsurprisingly).

2.3 Scenario logic
Give details of the logical difference between the base case model and scenarios (if any). This could be incorporated as text or where differences are substantial could be incorporated in the same manner as 2.2.

  • Overlap with 1.3 (implicit in describing 1.3, that would describe this).

2.4 Algorithms
Provide further detail on any algorithms in the model that (for example) mimic complex or manual processes in the real world (i.e. scheduling of arrivals/ appointments/ operations/ maintenance, operation of a conveyor system, machine breakdowns, etc.). Sufficient detail should be included (or referred to in other published work) for the algorithms to be reproducible. Pseudo-code may be used to describe an algorithm.

  • Those partially met are for describing some but not all of the algorithms. However, it is worth noting that it could be hard to actually ensure all relevant algorithms were described, if I hadn’t identified them in the more complex models.

2.5.1 Components - entities
Give details of all entities within the simulation including a description of their role in the model and a description of all their attributes.

  • Pretty basic requirement that all meet (unsurprisingly, as implicit in description of model / logic)

2.5.2 Components - activities
Describe the activities that entities engage in within the model. Provide details of entity routing into and out of the activity.

  • Pretty basic requirement that all meet (unsurprisingly, as implicit in description of model / logic)
  • Didn’t necessarily require explicit description of routing.

2.5.3 Components - resources
List all the resources included within the model and which activities make use of them.

  • Generally seem to be mentioned when included, particularly as often form part of output (e.g. resource utilisation)
  • When not mentioned (and based on known structure of model), assume not relevant.

2.5.4 Components - queues
Give details of the assumed queuing discipline used in the model (e.g. First in First Out, Last in First Out, prioritisation, etc.). Where one or more queues have a different discipline from the rest, provide a list of queues, indicating the queuing discipline used for each. If reneging, balking or jockeying occur, etc., provide details of the rules. Detail any delays or capacity constraints on the queues.

  • As for 2.5.3

2.5.5 Components - entry/exit points
Give details of the model boundaries i.e. all arrival and exit points of entities. Detail the arrival mechanism (e.g. ‘thinning’ to mimic a non-homogenous Poisson process or balking).

  • Generally fairly implicit
  • Overlap with 2.4 algorithms, and hence didn’t necessarily mark this down if not full detail of arrival mechanism.

3.1 Data sources
List and detail all data sources. Sources may include:
• Interviews with stakeholders,
• Samples of routinely collected data,
• Prospectively collected samples for the purpose of the simulation study,
• Public domain data published in either academic or organisational literature. Provide, where possible, the link and DOI to the data or reference to published literature.
All data source descriptions should include details of the sample size, sample date ranges and use within the study.

  • All meet.

3.2 Pre-processing
Provide details of any data manipulation that has taken place before its use in the simulation, e.g. interpolation to account for missing data or the removal of outliers.

  • Often had to assume that none occurred if none described.

3.3 Input parameters
List all input variables in the model. Provide a description of their use and include parameter values. For stochastic inputs provide details of any continuous, discrete or empirical distributions used along with all associated parameters. Give details of all time dependent parameters and correlation.
Clearly state:
• Base case data
• Data use in experimentation, where different from the base case.
• Where optimisation or design of experiments has been used, state the range of values that parameters can take.
• Where theoretical distributions are used, state how these were selected and prioritised above other candidate distributions.

  • This was very important as it allows to check that the code parameters were correct as - in several cases - the provided code did not include the base case parameters as described. When missing from the paper, it was not possible to check them.

3.4 Assumptions
Where data or knowledge of the real system is unavailable what assumptions are included in the model? This might include parameter values, distributions or routing logic within the model.

  • Although not relevant for reproduction, would be very relevant for reuse and validity

4.1 Initialisation
Report if the system modelled is terminating or non-terminating. State if a warm-up period has been used, its length and the analysis method used to select it. For terminating systems state the stopping condition.
State what if any initial model conditions have been included, e.g., pre-loaded queues and activities. Report whether initialisation of these variables is deterministic or stochastic.

  • Often not reported, and would be handy as it’s a pretty basic/fundamental aspect to the model that would help readers to have better understanding of how the model is working.

4.2 Run length
Detail the run length of the simulation model and time units.

  • Important to mention for same reasons as 3.3

4.3 Estimation approach
State the method used to account for the stochasticity: For example, two common methods are multiple replications or batch means. Where multiple replications have been used, state the number of replications and for batch means, indicate the batch length and whether the batch means procedure is standard, spaced or overlapping. For both procedures provide a justification for the methods used and the number of replications/size of batches.

  • The description of this criteria is quite confusing - generally just focussed on identifying whether it was multiple replications or a big run, and then if the numbers used were justified.
  • Partially met cases are those with replications that are not justified.

5.1 Software or programming language
State the operating system and version and build number.
State the name, version and build number of commercial or open source DES software that the model is implemented in.
State the name and version of general-purpose programming languages used (e.g. Python 3.5).
Where frameworks and libraries have been used provide all details including version numbers.

  • Will often mentioned the programming language, but not operating system or version numbers
  • This was pretty handy, when versions were given, for cases where no versions were given in the repository itself - although ideally, versions could just simply be there.

5.2 Random sampling
State the algorithm used to generate random samples in the software/programming language used e.g. Mersenne Twister.
If common random numbers are used, state how seeds (or random number streams) are distributed among sampling processes.

  • Frequently not described, despite some of the studies having implemented and used seeds.
  • Initially incorrectly addressed this in one of my evaluations, accidentally adding notes on the sampling of arrivals, rather than specifically on random samples. Have noted this to be clear that it was possible to misinterpret (although that may be more a mistake of my own! and not one others would necessarily make.)

5.3 Model execution
State the event processing mechanism used e.g. three phase, event, activity, process interaction.
Note that in some commercial software the event processing mechanism may not be published. In these cases authors should adhere to item 5.1 software recommendations.
State all priority rules included if entities/activities compete for resources.
If the model is parallel, distributed and/or use grid or cloud computing, etc., state and preferably reference the technology used. For parallel and distributed simulations the time management algorithms used. If the HLA is used then state the version of the standard, which run-time infrastructure (and version), and any supporting documents (FOMs, etc.)

  • Often not mentioned event processing machnism
  • This is a very long list of requirements in one category, and I found a bit difficult to evaluate, e.g. to check against all criteria

5.4 System specification
State the model run time and specification of hardware used. This is particularly important for large scale models that require substantial computing power. For parallel, distributed and/or use grid or cloud computing, etc. state the details of all systems used in the implementation (processors, network, etc.)

  • Model run time was really important to mention, but often not given. Particularly important for the models with longer run times, to know what to expect - including in light of reuse, and perhaps short times being necessary in certain contexts - or here, when troubleshooting, to know I might need to try lower numbers first while getting it working
  • Overlaps with 5.3 parallel and 5.1 operating system.

6.1 Computer model sharing statement
Describe how someone could obtain the model described in the paper, the simulation software and any other associated software (or hardware) needed to reproduce the results. Provide, where possible, the link and DOIs to these.

  • This is inevitable/selection bias, given we chose papers that had links to code.

8.3 DES checklist derived from ISPOR-SDM

Key:

  • S: Shoaib and Ramamohan (2021) - link to evaluation
  • Hu: Huang et al. (2019) - link to evaluation
  • L: Lim et al. (2020) - link to evaluation
  • K: Kim et al. (2021) - link to evaluation
  • A: Anagnostou et al. (2022) - link to evaluation
  • J: Johnson et al. (2021) - link to evaluation
  • He: Hernandez et al. (2015) - link to evaluation
  • W: Wood et al. (2021) - link to evaluation
Item S Hu L K A J He W
Model conceptualisation
1 Is the focused health-related decision problem clarified?
…the decision problem under investigation was defined. DES studies included different types of decision problems, eg, those listed in previously developed taxonomies.
✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅
2 Is the modeled healthcare setting/health condition clarified?
…the physical context/scope (eg, a certain healthcare unit or a broader system) or disease spectrum simulated was described.
✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅
3 Is the model structure described?
…the model’s conceptual structure was described in the form of either graphical or text presentation.
✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅
4 Is the time horizon given?
…the time period covered by the simulation was reported.
✅ ✅ ✅ ✅ ❌ ✅ ✅ ✅
5 Are all simulated strategies/scenarios specified?
…the comparators under test were described in terms of their components, corresponding variations, etc
✅ ✅ ✅ ✅ N/A ✅ ✅ ✅
6 Is the target population described?
…the entities simulated and their main attributes were characterized.
✅ ❌ ✅ ✅ 🟡 ✅ ✅ ✅
Paramaterisation and uncertainty assessment
7 Are data sources informing parameter estimations provided?
…the sources of all data used to inform model inputs were reported.
✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅
8 Are the parameters used to populate model frameworks specified?
…all relevant parameters fed into model frameworks were disclosed.
🟡 🟡 ✅ ✅ ✅ ✅ ✅ ✅
9 Are model uncertainties discussed?
…the uncertainty surrounding parameter estimations and adopted statistical methods (eg, 95% confidence intervals or possibility distributions) were reported.
🟡 ❌ ❌ ❌ ✅ N/A ✅ ✅
10 Are sensitivity analyses performed and reported?
…the robustness of model outputs to input uncertainties was examined, for example via deterministic (based on parameters’ plausible ranges) or probabilistic (based on a priori-defined probability distributions) sensitivity analyses, or both.
✅ ❌ ✅ ❌ N/A ✅ ❌ ✅
Validation
11 Is face validity evaluated and reported?
…it was reported that the model was subjected to the examination on how well model designs correspond to the reality and intuitions. It was assumed that this type of validation should be conducted by external evaluators with no stake in the study.
❌ ❌ ❌ ❌ ❌ ✅ ❌ ❌
12 Is cross validation performed and reported
…comparison across similar modeling studies which deal with the same decision problem was undertaken.
N/A ❌ ❌ ✅ ❌ ✅ ❌ ❌
13 Is external validation performed and reported?
…the modeler(s) examined how well the model’s results match the empirical data of an actual event modeled.
N/A N/A N/A ✅ ❌ ✅ ❌ ❌
14 Is predictive validation performed or attempted?
…the modeler(s) examined the consistency of a model’s predictions of a future event and the actual outcomes in the future. If this was not undertaken, it was assessed whether the reasons were discussed.
N/A N/A N/A N/A N/A ❌ N/A N/A
Generalisability and stakeholder involvement
15 Is the model generalizability issue discussed?
…the modeler(s) discussed the potential of the resulting model for being applicable to other settings/populations (single/multiple application).
✅ ✅ ✅ ❌ 🟡 ✅ ❌ ✅
16 Are decision makers or other stakeholders involved in modeling?
…the modeler(s) reported in which part throughout the modeling process decision makers and other stakeholders (eg, subject experts) were engaged.
❌ ❌ ❌ ❌ ✅ ❌ ❌ ❌
17 Is the source of funding stated?
…the sponsorship of the study was indicated.
✅ ❌ ✅ ✅ ✅ ✅ ❌ ✅
18 Are model limitations discussed?
…limitations of the assessed model, especially limitations of interest to decision makers, were discussed.
✅ 🟡 ✅ ✅ 🟡 ✅ ✅ ✅
Reflections

These guidelines have quite a different focus from STRESS-DES. STRESS-DES is very focused on what and how modelling work is performed. These guidelines do cover that, but have quite alot of focus on validity of the work, and good practice (e.g. stating funding, involving stakeholders). This is important, as different checklists can be important - and although practice is typically to use one practice, referring to more than one could also be beneficial.

1 Is the focused health-related decision problem clarified?
…the decision problem under investigation was defined. DES studies included different types of decision problems, eg, those listed in previously developed taxonomies.

  • Pretty basic requirement that all meet (unsurprisingly).

2 Is the modeled healthcare setting/health condition clarified?
…the physical context/scope (eg, a certain healthcare unit or a broader system) or disease spectrum simulated was described.

  • Pretty basic requirement that all meet (unsurprisingly).

3 Is the model structure described?
…the model’s conceptual structure was described in the form of either graphical or text presentation.

  • Pretty basic requirement that all meet (unsurprisingly).

4 Is the time horizon given?
…the time period covered by the simulation was reported.

  • There was only one study where this was not stated (Anagnostou et al. (2022)), though this didn’t impact the reproduction itself, for this study I was simply able to run the code and pretty much get the result required with minimal troubleshooting.

5 Are all simulated strategies/scenarios specified?
…the comparators under test were described in terms of their components, corresponding variations, etc

  • All papers with scenarios described them. As reflected for STRESS-DES, though the description of the scenario can feel clear from the paper, when the code was not provided, it could be quick tricky and time-consuming to work out how to appropriately change the code, in order to implement the scenario.

6 Is the target population described?
…the entities simulated and their main attributes were characterized.

  • Fulfilment didn’t impact the reproduction as this is more about interpretation/validity/etc.

7 Are data sources informing parameter estimations provided?
…the sources of all data used to inform model inputs were reported.

  • All met, although it’s worth noting that I didn’t check that every single parameter’s data source was stated (and indeed, its likely some were not) - simply whether I could identified that at least some were.

8 Are the parameters used to populate model frameworks specified?
…all relevant parameters fed into model frameworks were disclosed.

  • This was quite important for the reproduction as it allows us to check the parameters in the code are correct. When I identified that there were some discrepancies between the article and code, then in cases where a parameter was not given in the article, I couldn’t be sure if it was correct or not in the code, as there was nothing to compare against.

9 Are model uncertainties discussed?
…the uncertainty surrounding parameter estimations and adopted statistical methods (eg, 95% confidence intervals or possibility distributions) were reported.

  • Some don’t (but worth noting this doesn’t have impact on reproduction - beyond just these being additional values/lines in plot that we are trying to reproduce)

10 Are sensitivity analyses performed and reported?
…the robustness of model outputs to input uncertainties was examined, for example via deterministic (based on parameters’ plausible ranges) or probabilistic (based on a priori-defined probability distributions) sensitivity analyses, or both.

  • Some performed, some mentioned but no results presented, and some didn’t mention at all. One explained it to not be relevant.

11 Is face validity evaluated and reported?
…it was reported that the model was subjected to the examination on how well model designs correspond to the reality and intuitions. It was assumed that this type of validation should be conducted by external evaluators with no stake in the study.

  • Rare (only one completed) - but didn’t impact reproduction as this is more related to validity

12 Is cross validation performed and reported
…comparison across similar modeling studies which deal with the same decision problem was undertaken.

  • Rare (only two completed) - but didn’t impact reproduction as this is more related to validity

13 Is external validation performed and reported?
…the modeler(s) examined how well the model’s results match the empirical data of an actual event modeled.

  • Rare (only two completed) - although some argue not applicable - but didn’t impact reproduction as this is more related to validity

14 Is predictive validation performed or attempted?
…the modeler(s) examined the consistency of a model’s predictions of a future event and the actual outcomes in the future. If this was not undertaken, it was assessed whether the reasons were discussed.

  • Generally not applicable - but didn’t impact reproduction as this is more related to validity

15 Is the model generalizability issue discussed?
…the modeler(s) discussed the potential of the resulting model for being applicable to other settings/populations (single/multiple application).

  • More than half did discuss this - but didn’t impact reproduction as this is more related to reuse

16 Are decision makers or other stakeholders involved in modeling?
…the modeler(s) reported in which part throughout the modeling process decision makers and other stakeholders (eg, subject experts) were engaged.

  • Rare (only one mentioned) - but didn’t impact reproduction as this is more related to validity

17 Is the source of funding stated?
…the sponsorship of the study was indicated.

  • Good practice, usually a journal requirement, although some did not have - but didn’t impact reproduction.

18 Are model limitations discussed?
…limitations of the assessed model, especially limitations of interest to decision makers, were discussed.

  • Most discuss limitations (didn’t really assess how comprehensive these were though, except in two cases where they only had a very general or hint towards limitations) - but didn’t impact reproduction.

8.4 Timings

  • Shoaib and Ramamohan (2021) - 1h 56m
  • Huang et al. (2019) - 1h 28m
  • Lim et al. (2020) - 1h 12m
  • Kim et al. (2021) - 2h 12m
  • Anagnostou et al. (2022) - 53m
  • Johnson et al. (2021) - 1h 32m
  • Hernandez et al. (2015) - 1h 11m
  • Wood et al. (2021) - 1h 24m
Reflections

It sometimes took quite a while to find all this information from the articles - and we acknowledge there’s a chance that it is provided somewhere in the article but I missed it. For both of these reasons, we see the value in actually attaching a completed reporting checklist, clearly laying out key information about the model and study.

8.5 Use of reporting guidelines

Regarding whether each study mentioned using reporting guidelines:

  • Shoaib and Ramamohan (2021) - ❌
  • Huang et al. (2019) - ❌
  • Lim et al. (2020) - ❌
  • Kim et al. (2021) - ❌
  • Anagnostou et al. (2022) - ❌
  • Johnson et al. (2021) - ✅ Consolidated Health Economic Evaluation Reporting Standards (CHEERS) - Husereau et al. (2013)
  • Hernandez et al. (2015) - ❌
  • Wood et al. (2021) - ✅ STRESS-DES: Strengthening The Reporting of Empirical Simulation Studies (Discrete-Event Simulation) - Monks et al. (2019)

Although this is only a small sample, its interesting to note that the two studies that used reporting guidelines both had the highest proportion of fully met criteria in either reporting guideline.

8.6 Uses a previously reported model

Regarding whether each study was using a previously reported model:

  • Shoaib and Ramamohan (2021) - No
  • Huang et al. (2019) - No
  • Lim et al. (2020) - No
  • Kim et al. (2021) - Yes - previously described by Glover et al. (2018) and Thompson et al. (2018)
  • Anagnostou et al. (2022) - No
  • Johnson et al. (2021) - Yes - EPIC model previously described by Sadatsafavi et al. (2019)
  • Hernandez et al. (2015) - No
  • Wood et al. (2021) - Yes - previously described by Wood et al. (2020)

Again, a small sample, but this time a weaker pattern. We note that the two studies that used reporting guidelines are the same that are previously reported models here, alongside one other study which was previously reported but did not use reporting guidelines in this instance, and has a lower proportion of criteria that were fully met.

8.7 References

Anagnostou, Anastasia, Derek Groen, Simon J. E. Taylor, Diana Suleimenova, Nura Abubakar, Arindam Saha, Kate Mintram, et al. 2022. “FACS-CHARM: A Hybrid Agent-Based and Discrete-Event Simulation Approach for Covid-19 Management at Regional Level.” In 2022 Winter Simulation Conference (WSC), 1223–34. https://doi.org/10.1109/WSC57314.2022.10015462.
Glover, Matthew J., Edmund Jones, Katya L. Masconi, Michael J. Sweeting, Simon G. Thompson, Janet T. Powell, Pinar Ulug, and Matthew J. Bown. 2018. “Discrete Event Simulation for Decision Modeling in Health Care: Lessons from Abdominal Aortic Aneurysm Screening.” Medical Decision Making 38 (4): 439–51. https://doi.org/10.1177/0272989X17753380.
Hernandez, Ivan, Jose E. Ramirez-Marquez, David Starr, Ryan McKay, Seth Guthartz, Matt Motherwell, and Jessica Barcellona. 2015. “Optimal Staffing Strategies for Points of Dispensing.” Computers & Industrial Engineering 83 (May): 172–83. https://doi.org/10.1016/j.cie.2015.02.015.
Huang, Shiwei, Julian Maingard, Hong Kuan Kok, Christen D. Barras, Vincent Thijs, Ronil V. Chandra, Duncan Mark Brooks, and Hamed Asadi. 2019. “Optimizing Resources for Endovascular Clot Retrieval for Acute Ischemic Stroke, a Discrete Event Simulation.” Frontiers in Neurology 10 (June). https://doi.org/10.3389/fneur.2019.00653.
Husereau, Don, Michael Drummond, Stavros Petrou, Chris Carswell, David Moher, Dan Greenberg, Federico Augustovski, Andrew H. Briggs, Josephine Mauskopf, and Elizabeth Loder. 2013. “Consolidated Health Economic Evaluation Reporting Standards (CHEERS) Statement.” Value in Health 16 (2): e1–5. https://doi.org/10.1016/j.jval.2013.02.010.
Johnson, Kate M., Mohsen Sadatsafavi, Amin Adibi, Larry Lynd, Mark Harrison, Hamid Tavakoli, Don D. Sin, and Stirling Bryan. 2021. “Cost Effectiveness of Case Detection Strategies for the Early Detection of COPD.” Applied Health Economics and Health Policy 19 (2): 203–15. https://doi.org/10.1007/s40258-020-00616-2.
Kim, Lois G., Michael J. Sweeting, Morag Armer, Jo Jacomelli, Akhtar Nasim, and Seamus C. Harrison. 2021. “Modelling the Impact of Changes to Abdominal Aortic Aneurysm Screening and Treatment Services in England During the COVID-19 Pandemic.” PLOS ONE 16 (6): e0253327. https://doi.org/10.1371/journal.pone.0253327.
Lim, Chun Yee, Mary Kathryn Bohn, Giuseppe Lippi, Maurizio Ferrari, Tze Ping Loh, Kwok-Yung Yuen, Khosrow Adeli, and Andrea Rita Horvath. 2020. “Staff Rostering, Split Team Arrangement, Social Distancing (Physical Distancing) and Use of Personal Protective Equipment to Minimize Risk of Workplace Transmission During the COVID-19 Pandemic: A Simulation Study.” Clinical Biochemistry 86 (December): 15–22. https://doi.org/10.1016/j.clinbiochem.2020.09.003.
Monks, Thomas, Christine S. M. Currie, Bhakti Stephan Onggo, Stewart Robinson, Martin Kunc, and Simon J. E. Taylor. 2019. “Strengthening the Reporting of Empirical Simulation Studies: Introducing the STRESS Guidelines.” Journal of Simulation 13 (1): 55–67. https://doi.org/10.1080/17477778.2018.1442155.
Sadatsafavi, Mohsen, Shahzad Ghanbarian, Amin Adibi, Kate Johnson, J. Mark FitzGerald, William Flanagan, Stirling Bryan, and Don Sin. 2019. “Development and Validation of the Evaluation Platform in COPD (EPIC): A Population-Based Outcomes Model of COPD for Canada.” Medical Decision Making 39 (2): 152–67. https://doi.org/10.1177/0272989X18824098.
Shoaib, Mohd, and Varun Ramamohan. 2021. “Simulation Modelling and Analysis of Primary Health Centre Operations.” arXiv, June. https://doi.org/10.48550/arXiv.2104.12492.
Thompson, Simon G, Matthew J Bown, Matthew J Glover, Edmund Jones, Katya L Masconi, Jonathan A Michaels, Janet T Powell, Pinar Ulug, and Michael J Sweeting. 2018. “Screening Women Aged 65 Years or over for Abdominal Aortic Aneurysm: A Modelling Study and Health Economic Evaluation.” Health Technology Assessment 22 (43): 1–142. https://doi.org/10.3310/hta22430.
Wood, Richard M., Christopher J. McWilliams, Matthew J. Thomas, Christopher P. Bourdeaux, and Christos Vasilakis. 2020. “COVID-19 Scenario Modelling for the Mitigation of Capacity-Dependent Deaths in Intensive Care.” Health Care Management Science 23 (3): 315–24. https://doi.org/10.1007/s10729-020-09511-7.
Wood, Richard M., Adrian C. Pratt, Charlie Kenward, Christopher J. McWilliams, Ross D. Booton, Matthew J. Thomas, Christopher P. Bourdeaux, and Christos Vasilakis. 2021. “The Value of Triage During Periods of Intense COVID-19 Demand: Simulation Modeling Study.” Medical Decision Making 41 (4): 393–407. https://doi.org/10.1177/0272989X21994035.
Zhang, Xiange, Stefan K. Lhachimi, and Wolf H. Rogowski. 2020. “Reporting Quality of Discrete Event Simulations in Healthcare—Results From a Generic Reporting Checklist.” Value in Health 23 (4): 506–14. https://doi.org/10.1016/j.jval.2020.01.005.
7  Evaluation of the repository
9  Reflections from research compendium and test-run