Day 2 – Reproducing Kim et al. 2021

Note

Read article, decided on scope, then began running model. Total time used: 4h 3m (10.1%).

09.38-09.43: High entropy secret

Upload of code to repository triggered a notification from GitGuardian for upload of a high entropy secret. Opening the GitGuardian dashboard, I found the source of this issue was an API key uploaded to original_study/AAA_DES_model/input/NAAASP_Men_2020-05-11/DES_Data_Input_NAAASP_Men_30years_time_horizon_2020-05-11.R on line 376:

# SOURCE: Love-Koh 2015 (https://reader.elsevier.com/reader/sd/pii/S1098301515018471?token=AB11057F469D57EBE990778CCDC883E9D35D3CFD83AD143352F3AD497E822198F957DBCE2EA86AE6DC16F2F6CE350409)

This is not a concern - the link doesn’t work with that token for others.

The DOI for this article is https://doi.org/10.1016/j.jval.2015.03.1784.

09.53-09.55: Correct file paths

For study_publication.qmd.

10.05-10.15, 10.29-10.45: Read journal article

Uses a previously developed and validated model described in:

Glover MJ, Jones E, Masconi KL, Sweeting MJ, Thompson SG. Discrete Event Simulation for Decision Modeling in Health Care: Lessons from Abdominal Aortic Aneurysm Screening. Med Decis Making. 2018; 38(4):439–51. https://doi.org/10.1177/0272989X17753380 PMID: 31665967
Thompson SG, Bown MJ, Glover MJ, Jones E, Masconi KL, Michaels JA, et al. Screening women aged 65 years or over for abdominal aortic aneurysm: a modelling study and health economic evaluation. Health Technol Assess. 2018; 22(43):1–142. https://doi.org/10.3310/hta22430 PMID: 30132754

Some notes on scope (beyond obvious tables/figures):

In-text result for Surveillance cohort: Scan suspension (re: breakdown of deaths by type) are not captured in tables/figures
Supplementary figure 1 and 2 are more like methods than results? Are referenced in methods.
In-text results for Surveillance cohort: Drop-out are captured in Table 3 and Figure 4
Also captured are Surveillance cohort: Threshold for surgery
Supplementary figure 3 and supplementary table 2 are definitely in scope

10.46-11.02: Define scope for reproduction

I described my interpretation of scope within scope.qmd.

11.09-11.13: Discussed scope with Tom

Looked over and happy with scope. He then did another look over the paper, but confirmed no further items.

11.26-11.32 : Archive scope

Update CHANGELOG.md and CITATION.cff, set up sync on Zenodo, then created a Git release.

12.50-12.56: Look over code

README:

Directs to NAASP_DES_Example_Script.R which provides walk through example for running model.
Notes the repository contains code for two papers - we are interested in NAAASP_COVID (rather than SWAN)

Copied the items relevant to NAAASP COVID into reproduction/. As README states results are saved to output/, create a placeholder output folder

The models/ folder seems to contain scripts to run each of the scenarios.

12.57-13.04: Set up environment

Based on experience trying to backdate R and packages in my previous reproduction (Huang et al. 2019), I decided that this time I would start with the latest R and packages, and then backdate if its not working.

This means using R 4.4.1 (paper mentions 3.6.3).

At the start of each script, it lists the packages to be installed (though not versions). Created a DESCRIPTION file with these packages.

Title: Computational reproducibility assessment of Kim et al. 2019
Depends: 
    R
Imports:
    Rcpp,
    expm,
    msm,
    foreach,
    iterators,
    doParallel

Ran:

renv::init(bare=TRUE)
renv::install()
renv::snapshot()

13.05-13.13, 13.48-13.55, 14.07-14.21, 14.53-15.08: Trying to run model

Tried running run_aaamodel_65yo_scen0.R. Had error:

this.dir <- dirname(parent.frame(2)$ofile)
Error in dirname(parent.frame(2)$ofile) : 
  a character vector argument expected

Realised this has a comment above stating:

# Set the working directory to the root directory of AAA_DES_model (can only run this if sourcing the file)

So instead tried running with Source - this worked, up until this line:

> v1other
postSurgeryInitialPeriod = 0.08213552 
timeToMonitoringFollowingOpenSurgery = 0.1149897 
timeBetweenMonitoringFollowingEvarSurgery = 1 
zeroGrowthDiameterThreshold = 2 
baselineDiameters = data.frame with 2 columns:
  size = 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 ...
  weight = 2.86e-06 5.71e-06 4.43e-05 0.000351429 0.001205714 0.005032857 0.017921429 0.045698572 ...
prevalenceThreshold = 3 
aortaDiameterThresholds = list: =Error in cat(names(element)[i], "=", element[[i]], " ", sep = "") : 
  argument 3 (type 'list') cannot be handled by 'cat'

v1other was created when we called source("input/NAAASP_Men_2020-05-11/DES_Data_Input_NAAASP_Men_30years_time_horizon_2020-05-11.R"). Looking at that script, aortaDiameterThresholds are set as v1other$aortaDiameterThresholds <- list(c(3.0, 4.5, 5.5)). Viewing the object rather than printing it with View(v1other), I can see that the thresholds are stored, but its just a list within a list.

I tried commenting the print statement in run_aaamodel... but this just led to a later error for the same variable when it is used. As such, I instead altered assignment of this variable in the sourced script to c(3.0, 4.5, 5.5) (removing list()).

This then ran without issue. As it ran, I can see it sets a random seed (set.seed(3210)). As per usual, paused timing while it ran.

Spent a long while on processPersons() (scen0.invite <- processPersons(v0, v1other, v2, personData.screen)). At 13.48 (with model running since 13.13), I could see these would likely be slow models to run, so logged into the remote computer, cloned the GitHub repository there, and ran the next script starting at 13.55, run_aaamodel_65yo_scen0.R. To do so, after cloning, I ran:

cd stars-reproduce-kim-2021/reproduction
Rscript -e "renv::restore()"
Rscript -e "source('models/NAAASP_COVID_modelling/run_aaamodel_65yo_scen1.R')"

Reflection

Including the expected run time would be handy, as I assume this is working fine, but that’s working on the assumption that these models take a long time to run (which I’m not sure yet whether that is the case or not).

In this case, as I later found out they used HPC, it would’ve been beneficial to mention that also!

I checked the run_aaamodel... scripts and realised that in each, none were using parallel processing (as v0$method <- "serial").

Looking at the DES_Model code I can see the example of processPersons() allows three possible inputs:

“serial”
“parallel” (which uses parallel)
“foreach” (which uses doParallel)

The function psa() is the same, but also includes “parallelBatches” alongside “foreach”. However, the consequence of those inputs is to return an error that the method has not been implemented for psa. Hence, psa can only run with “serial” or “parallel”. Likewise for psaAboveDiagnosisThreshold().

At 14.16 (so having let the model run since 13.55, for 21 minutes), I cancelled the run on the remote computer, and altered the scenario script to set v0$method <- "parallel". Then set that running, starting at 14.21. At this moment, the first scenario was still running on my machine (since 13.13, so for apx. 1h 10m now).

At 14.39, I noticed the parallel model said Killed. In case I might have accidentally done this, I just set it to rerun again. However, at 14.52, it showed as Killed again. Upon Googling, I can see this message occurs when you run out of memory - which perhaps, might be why all were set to run sequentially, if its too intensive to run in parallel. I switched it back to “serial”.

However, the model still running on my machine has now been going for 1h 42m. Having looked in the repository and article for any indication of runtime or requirement of HPC, I’ve yet to find anything. I tried looking now at the article first describing the model (https://doi.org/10.1177/0272989X17753380). In this article, they state the run time:

However, the computational requirements of the DES were extensive, given the number of individuals needed to reduce sampling variation to an acceptable level and characterizing uncertainty through PSA. Run time was in the region of 24 h to run the model with 500,000 patients and 1,000 PSA iterations, even with parallelization and the use of a high-powered computer.

In this case, “each scenario model is run for 10 million hypothetical individuals randomly drawn with replacement from the distribution of the relevant population. The models are run for a period of 30 years”. However, as there is no mention of performing probabilistic sensitivity analysis iterations in the article, we could expect this to be quicker than that (although note it is 10m patients instead of 500k).

Discussed with Tom and agreed to:

1. Try and just run it with very few people, just so we can see if the model is working, and whether we could get similar results with fewer people in the model
1. Try and run it sequentially on the remote computer overnight

15.09-15.42: Running the model with fewer people

DES_Input_Definitions states that the number of people to run through the DES is defined by v0$numberOfPersons, default 1000. Can see in run_aaamodel_65yo_scen0.R that it has been set to v0$numberOfPersons <- 1e7. I replaced this with v0$numberOfPersons <- 1000. This finished almost immediately, then hit an error:

> TableOfCounts(scen0.invite, v1other)
Error in "screen" %in% eventHistory$events && !("nonvisualization" %in%  : 
  'length = 3' in coercion to 'logical(1)'

TableOfCounts() is from DES_Model.R. Working through each line of the function, I found that the error is occurring for scre_dropout=countDropouts(result, v1other).

In this function, personsInfo (result, ie. scen0.invite) is used. It loops through the EventHistories for each person: for (i in 1:length(personsInfo$eventHistories)). It checks whether their events:

Includes “screen” ("screen" %in% eventHistory$events)
Does not include “nonvisualization” (!("nonvisualization" %in% eventHistory$events))
If the measured size for screen is greater than or equal to v1other$aortaDiameterThresholds[[1]]

The error occurs for that final comparison. That is the item we had to change earlier to be able to get the model to run. I tried setting that back to being a list within a list - v1other$aortaDiameterThresholds <- list(c(3.0, 4.5, 5.5)). However, I realised that caused the issue, and that in fact, run_aaamodel_65yo_scen0.R is changing it to list() again before running the model:

## Change v1other$aortaDiameterThresholds to be a list (new syntax)
v1other$aortaDiameterThresholds <- list(v1other$aortaDiameterThresholds)

If I comment out that line, an error occurs. Hence, it seems we require it to be in a list within a list for the model, but just a list for TableOfCounts(). Looking at all the repository code, the reason becomes apparent in models/NAASP_COVID_modelling/run_aaamodel_surv_scen3.R, when multiple aaortaDiameterThresholds are provided.

As such, I set about modifying the later processing functions so that they are able to use this list() format:

input/NAAASP_Men_2020-05-11/DES_Data_Input_NAAASP_Men_30years_time_horizon_2020-05-11.R: returned back to list(c()), but then changed run_aaamodel_65yo_scen0.R to not add an additional list() over the top.
DES_Model.R: countDropouts(): >= v1other$aortaDiameterThresholds[[1]] to >= v1other$aortaDiameterThresholds[[1]][1]

The whole script then ran successfully!

Reflection

These errors will likely be the result of having modified code but no re-run everything from scratch (unsurprising given the anticipated model run times).

15.43-16.00, 16.03-16.43, 16.45-16.47, 16.53-16.56: Inspecting outcome from a model run

That R script ran the status quo (I0) scenario for invited 65 year olds. The parameters from Table 1 are below, along with their location in the data from DES_Input_Definitions.xlsx

Attendance 75% - v2$probOfAttendScreen
Drop-out rate/annum: 6% - v2$rateOfDropoutFromMonitoring
Threshold for surgery: 5.5cm - v1other$aortaDiameterThresholds[[1]][3] (“These are the aortic cut-points where surveillance intervals change. Note, the first cut-point must always relate to the aortic size where individuals enter the AAA surveillance programme (e.g. 3.0cm). The last cut-point must always relate to the aortic size where surgery is to be considered (e.g. 5.5cm)”)

Looking at the next R script, run_aaamodel_65yo_scen1.R, I can see that they repeat the model multiple times to get results from varying parameters of that scenario. That script is scenario 1 which, from Table 1, we can see delays invitation from 3 months to 5 years.

I switched to trying that script, but again first setting:

# v1other$aortaDiameterThresholds <- list(v1other$aortaDiameterThresholds)
v0$numberOfPersons <- 1000

This ran eight “scenarios” within the script. The output dataframe scen1summaryi showed the results from each of those eight variants. I add a line to the script to save this to csv. However, I could see that the number dead was 7 or 8 across all scenarios, so we likely did need to up the number in the model a bit more to start getting some real results.

import pandas as pd

res = pd.read_csv('output_65yo_scen1_n1000.csv')
res[['delayscr', 'aaadead']].sort_values(by='delayscr')

	delayscr	aaadead
7	0.00	7
0	0.25	7
1	0.50	7
2	1.00	7
3	2.00	7
4	3.00	7
5	4.00	8
6	5.00	8

I increased it to 10,000 patients. For each of the eight runs in the script, it took about 27 seconds (so under four minutes in total). Here we start to see more change between each scenario, and it becomes more feasible to look at the results.

import pandas as pd

res = pd.read_csv('output_65yo_scen1_n10000.csv')
res[['delayscr', 'aaadead', 'emeropen']].sort_values(by='delayscr')

	delayscr	aaadead	emeropen
7	0.00	96	33
0	0.25	96	33
1	0.50	98	32
2	1.00	96	31
3	2.00	98	31
4	3.00	98	32
5	4.00	101	32
6	5.00	103	32

Each result here relates to Table 2 (i.e. excess AAA deaths, and excess emergency operations). To calculate excess, given that the results appears to increment from the first 0 scenario, I’m assuming that this is the change in deaths from the 0 scenario.

I created an .Rmd file to start processing the results. This required the addition of rmarkdown to the renv. I add dplyr for processing the results. Looking at excess deaths, we can see the anticipated pattern, although for emergency operations, the numbers are still too small that they fluctuate, and emergency operations goes in the wrong direction.

import pandas as pd

pd.read_csv('tab2.csv')

	delayscr	excess_death	excess_op
0	0.00	0	0
1	0.25	0	0
2	0.50	2	-1
3	1.00	0	-2
4	2.00	2	-2
5	3.00	2	-1
6	4.00	5	-1
7	5.00	7	-1

I tried upping it to 100,000 people, and recorded the time with different settings:

“serial” - 5 minutes 7 seconds
“parallel” - 2 minutes 12 seconds - so estimated 17 and a half minutes in total
“foreach” - not possible, Error in processPersons(v0, v1other, v2, personData.screen) : v0$randomSeed does not yet work with v0$method="foreach"

For each of these, about 45 seconds is results processing at the end (e.g. Eventsandcosts())

Whilst these ran, I looked through the repository to try and spot whether they had functions to generate the plots from the article. I found plotting functions in DES_Model.R, shiny_output_functions.R and NAAASP_DES_Example_Script.R, though none relevant to the article, so it appears I’ll need to write the code to do that processing.

Timings

import sys
sys.path.append('../')
from timings import calculate_times

# Minutes used prior to today
used_to_date = 32

# Times from today
times = [
    ('09.38', '09.43'),
    ('09.53', '09.55'),
    ('10.05', '10.15'),
    ('10.29', '10.45'),
    ('10.46', '11.02'),
    ('11.09', '11.13'),
    ('11.26', '11.32'),
    ('12.50', '12.56'),
    ('12.57', '13.04'),
    ('13.05', '13.13'), 
    ('13.48', '13.55'),
    ('14.07', '14.21'),
    ('14.53', '15.08'),
    ('15.09', '15.42'),
    ('15.43', '16.00'),
    ('16.03', '16.43'),
    ('16.45', '16.47'),
    ('16.53', '16.56')]

calculate_times(used_to_date, times)

Time spent today: 211m, or 3h 31m
Total used to date: 243m, or 4h 3m
Time remaining: 2157m, or 35h 57m
Used 10.1% of 40 hours max