Day 3 – Reproducing Kim et al. 2021

Note

Reproduced Figure 1 and Table 2. Total time used: 8h 59m (22.5%).

Timings below: Working on Table 2 and Figure 1

Timings:

09.10-09.11
09.20-09.22
09.30-09.40
09.52-10.44
10.51-10.56
11.02-11.56
12.59-13.27

GitHub repository dates

I realised I had forgotten to check the GitHub repository dates v.s. paper dates. The article was published in June 2021. There are three commits after this point:

19 July 2021 - “Adding SWAN models to repository” - relevant changes:
- debug=F and some error handling in processPersons() from DES_Model.R
- Changing from v1other$aortaDiameterThresholds <- c(3.0, 4.5, 5.5) to v1other$aortaDiameterThresholds <- list(c(3.0, 4.5, 5.5)) in input/NAAASP_Men_2020-05-11/DES_Data_Input_NAAASP_Men_30years_time_horizon_2020-05-11.R
- Removing an associated .xlsx file for 202-05-11
19 July 2021 - “Updating README.md”
- More detail in README (see below)
12 February 2024 - “Fixed checks with vectorised elements, which was causing errors in R v4.3”
- Add .gitignore
- Modification in Auxillary_Functions.R
- Formatting in DES_Model.R

Original:

The following directories are included:

/models – R scripts that run the DES models for AAA screening

/models/Example – An example script to run the DES is contained here

/functions – Contains the DES model code

/input – Contains the input parameters and .csv files needed to run the DES models

/input/NAAASP_Men_2020-05-11 – Updated parameters for AAA screening in men, updated as of 11/05/2020

/output – Directory where Rdata output files are saved

New:

The following directories are included:

/models – R scripts that run the DES models for AAA screening

/models/Example – An example script to run the DES is contained here

/models/SWAN – Screening Women for Abdominal Aortic Aneurysm (SWAN) model scripts. See our Lancet publication and HTA report for further details

/models/NAAASP_COVID_modelling – Scripts for modelling the impact of changes to Abdominal Aortic Aneurysm screening and treatment services in England during the COVID-19 pandemic. See our PLOS ONE publication for further details

/functions – Contains the DES model code

/input – Contains the input parameters and .csv files needed to run the DES models

/input/SWAN – Screening Women for Abdominal Aortic Aneurysm (SWAN) input parameters. See our Lancet publication and HTA report for further details

/input/NAAASP_COVID_modelling – Input parameters for modelling the impact of changes to Abdominal Aortic Aneurysm screening and treatment services in England during the COVID-19 pandemic. See our PLOS ONE publication for further details

/input/NAAASP_Men_2020-05-11 – Updated parameters for AAA screening in men, updated as of 11/05/2020

/output – Directory where Rdata output files are saved

Reflection

Given how far I have progressed with this, I won’t go back to using the version as of publication, but we should bare this in mind later.

Incrementing number of people for 65yo scen1

Ran full script of run_aaamodel-65yo_scen1.R with 100,000 people and parallel. However, this evidently is still too few people, looking at the results.

import pandas as pd

pd.read_csv('tab2_100k.csv')

	delayscr	excess_death	excess_op
0	0.00	0	0
1	0.25	-6	-3
2	0.50	-5	3
3	1.00	-6	-1
4	2.00	-1	-3
5	3.00	2	3
6	4.00	25	7
7	5.00	41	7

I pulled changes to files and renv onto remote computer, and then ran that with 1,000,000 people. To open and change file:

nano 'models/NAAASP_COVID_modelling/run_aaamodel_65yo_scen1.R'

To run:

Rscript -e "source('models/NAAASP_COVID_modelling/run_aaamodel_65yo_scen1.R')"

Per run this took:

3 minutes 46 seconds on remote computer - so estimated 30 minutes in total.
13 minutes 44 seconds on local computer - so would be an estimated 1 hour 50 minutes in total

Looking at the convergence plots (supplementary figure 1 and 2), 1 million people is nearing closer to convergence - but 2 million even more-so.

import pandas as pd

pd.read_csv('tab2_1m.csv')

	delayscr	aaadead	emeropen	excess_death	excess_op
0	0.00	9226	2830	0	0
1	0.25	9200	2824	-26	-6
2	0.50	9194	2843	-32	13
3	1.00	9199	2830	-27	0
4	2.00	9210	2836	-16	6
5	3.00	9298	2878	72	48
6	4.00	9423	2933	197	103
7	5.00	9612	2993	386	163

Scaling the numbers

Looking at the result, I’m a little uncertain over the numbers we are getting. Looking at Table 2’s caption, I’m thinking that perhaps I may need to scale these (as using the raw numbers from the simulation, we’d expect these to be less than the table). The caption mentions that the:

National male 65 year old cohort for England: n = 279,798
Expected AAA deaths over 30y in status quo = 2564
Expected emergency operations over 30y in status quo = 1041

I tried scaling the results so that it reflects deaths expected if population were 279,798 (rather than 1 million). I scaled the results from aaadead and emerevar (e.g. round(279798*(aaadead/1000000))). The number of deaths looks similar to the expected from Table 2 (2564), but the number of emergency operations is very different.

import pandas as pd

pd.read_csv('scaled_1m.csv')

	delayscr	aaadead	scaled_dead	scaled_emer
0	0.00	9226	2581	261
1	0.25	9200	2574	261
2	0.50	9194	2572	261
3	1.00	9199	2574	261
4	2.00	9210	2577	261
5	3.00	9298	2602	261
6	4.00	9423	2637	261
7	5.00	9612	2689	261

Scenario 0

I was wondering whether this should be compared against the result from the status quo script rather than from when delay is 0 (or whether that should just be the same thing anyway!).

I repeated this for the results from scenario 0 (status quo):

scale_dead_s0 <- y65_s0 %>%
  select(aaadead) %>%
  mutate(scaled_dead = round(279798*(aaadead/1000000)))
write.csv(scale_dead_s0, "../../logbook/posts/2024_07_30/dead_s0.csv", row.names=FALSE)

It came out exactly the same, so seems, no issue with using the 0 result from the run of scen1.

import pandas as pd

pd.read_csv('dead_s0.csv')

	aaadead	scaled_dead
0	9226	2581

Confirming which column gives number of operations

Given that number of emergency operations looked very different, I was suspicious if I was using the correct column for this. It seems clear that deaths and ruptures are aaadead and rupt respectively, but I was less certain for surgeries (elecopen vs elecevar) (emeropen vs emerevar).

I tried applying scaling to every column from row 1, to see which came out most similar.

as.data.frame(t(head(y65_s1, 1))) %>%
  rename(result = 1) %>%
  mutate(scaled = round(279798*(result/n_person)))

Based on the names, there are a few candidates for emergency operations: emerevar, emeropen, reintemerevar and reintemeropen. I’m not certain what the difference is between these, but starting with their scaled values:

emerevar - 242
emeropen - 790
reintemerevar - 935
reintemeropen - 183

The expected number is 1041, so reintemerevar does appear closest.

import pandas as pd

pd.read_csv('scale_all_first_row.csv', index_col=0)

	result	scaled
n	1000000.00	279798
delayscr	0.25	0
inv	997073.00	278979
scr	748352.00	209387
reinv	135543.00	37925
nonatt	248721.00	69592
monitor	265786.00	74366
dropout	9445.00	2643
oppdet	28438.00	7957
consult	25555.00	7150
elecevar	10721.00	3000
elecopen	4739.00	1326
rupt	10093.00	2824
emerevar	865.00	242
emeropen	2824.00	790
reintelecevar	3340.00	935
reintemerevar	3340.00	935
reintemeropen	655.00	183
aaadead	9200.00	2574
nonaaadead	909861.00	254577

I also then looked into the repository, for any written explanation for each variable, or the code behind it, to help confirm which is appropriate. Looking across the repository, these columns are created within each of the scenario scripts. From run_aaamodel_65yo_scen1.R:

elecevar<-Eventsandcosts(scen1.invite)[14,2]
elecopen<-Eventsandcosts(scen1.invite)[15,2]

emerevar<-Eventsandcosts(scen1.invite)[17,2]
emeropen<-Eventsandcosts(scen1.invite)[18,2]

reintelecevar<-Eventsandcosts(scen1.invite)[21,2]
reintemerevar<-Eventsandcosts(scen1.invite)[21,2]

Looking directly at the output of Eventsandcosts(scen1.invite) I can see that these match up with:

14 - electiveSurgeryEvar
15 - electiveSurgeryOpen
17 - emergencySurgeryEvar
18 - emergencySurgeryOpen
21- reinterventionAfterElectiveEvar

I’m not certain why reintemerevar uses 21 and not 22, as 22 is reinterventionAfterEmergencyEvar, but this appears to be the same in each of the scenario scripts.

These are assigned from costs, and v2$costs is in the data dictionary, DES_Input_Definitions.xlsx:

electiveSurgeryEvar: Elective EVAR repair
electiveSurgeryOpen: Elective Open repair
emergencySurgeryEvar: Emergency EVAR repair
emergencySurgeryOpen: Emergency Open repair
reinterventionAfterElectiveEvar: Re-intervention after elective EVAR

Having looked at this, I am presuming that the number of emergency operations would be given by combining emergencySurgeryEvar (17) (emerevar) and emergencySurgeryOpen (18) (emeropen).

Once I did that, the numbers looked much closer to the expected 1041 from the paper, with 1034 from 1 million people, and 1028 from 2 million people.

Excess deaths

I’m seeing some negatives (but the paper does not, and has several 0). I’m wondering if, as the result is “excess” deaths, perhaps I should only be counting when it was over 0 (and setting negative numbers to 0), so I altered it accordingly.

I also trimmed down the rows displayed for Table 2, to make the article.

Two million people

I ran run_aaamodel_65yo_scen1.R again but with 2 million people, as that looks like a more appropriate figure from the convergence plots, but I wasn’t certain if:

1. the remote computer would manage it (process killed when ran with 10 million)
1. how long it would take
1. how much of a difference it would make

RE: (a) - it did manage it. While running, I did note we are now getting very large objects from each run (e.g. processPersons is about to return an object of size 4.5 Gb)

RE: (b) - the time for one run with 2 million people was 11 minutes 20 seconds. For all eight runs, it took a total of “1.01512100968096” (which is weird - normally it gives me a result in minutes). If we assume this is hours, it’s just over an hour. My estimate based on the time for one run would be 1 hour 30 minutes. Have changed the timing statement to use difftime instead with units specified to avoid this issue in future.

RE: (c) - the results are pretty similar. I renamed the output files to we had results with 1 million and 2 million, then applied same function to create table 2, and they were pretty similar.

import pandas as pd

display(pd.read_csv('tab2_correct_1m.csv'))
display(pd.read_csv('tab2_correct_2m.csv'))

	months	excess_dead	excess_emer
0	6m	0	3
1	12m	0	0
2	24m	0	1
3	36m	21	14
4	48m	56	35
5	60m	108	56

	months	excess_dead	excess_emer
0	6m	0	0
1	12m	0	0
2	24m	0	0
3	36m	20	11
4	48m	50	31
5	60m	105	60

Given their similarity, I will continue with running the simulation with 1 million people for the time being, since running with 2 million was much longer and very computationally expensive.

13.29-13.39: Running 65 year old scenario 2

Scenario 2 has a delay to invitation of 6 months, and attendance of 45 to 75%. This is required for Table 2, which also displays results from attendance of 65%, 55% and 45% (compared against 75%).

I set this up to run on the remote computer, with the changes as before, of:

Using parallel
1 million
Saving the results to csv
Commenting out list(v1other$aortaDiameterThresholds)
Timings for one run and all runs

Timings:

One run: 3.952 minutes = 237 seconds = 3 minutes 57 seconds
Full run: 19.123 minutes = 1147 seconds = 19 minutes 7 seconds

14.00-14.06: Running surv scenario 1

Midway through creating figure 1, I set remote computer to run another scenario.

Amended script as did for 65yo scen2, with an additional fix too - correct path to model script to DES_Model.R - and then ran on remote machine.

Timings:

One run: 13.368 minutes = 802 seconds = 13 minutes 22 seconds
Full run: 70.183 minutes = 4211 seconds = 1 hour 10 minutes 11 seconds

This runtime reaffirmed for me the use of 1,000,000 people (rather than 2,000,000).

13.40-13.44, 13.51-13.59, 14.07-14.29: Creating Figure 1

Whilst that ran - and then, setting others to run during this too (hence time jumps) - I set about creating Figure 1 from the results of scenario 1 with 1 million people.

Add ggplot2 (to make plots) and tidyr (to melt dataframe) to renv.

Wrote code to create figure, and was satisified this was reproduced at 14.29.

import sys
sys.path.append('../')
from timings import calculate_times

# Minutes used prior to today
used_to_date = 243

# Times from today
times = [
    ('09.10', '09.11'),
    ('09.20', '09.22'),
    ('09.30', '09.40'),
    ('09.52', '10.44'),
    ('10.51', '10.56'),
    ('11.02', '11.56'),
    ('12.59', '13.27'),
    ('13.29', '13.39'),
    ('13.40', '13.44'),
    ('13.51', '13.59'),
    ('14.00', '14.06'),
    ('14.07', '14.29')]

calculate_times(used_to_date, times)

Time spent today: 202m, or 3h 22m
Total used to date: 445m, or 7h 25m
Time remaining: 1955m, or 32h 35m
Used 18.5% of 40 hours max

14.42-14.49: Checking scenario 1 simulation table 2 results

Now that we have the function to process scenario 1 correctly, I went back to re-run the results from fewer numbers of people, just to reaffirm whether the chosen number is appropriate (or whether we could use fewer).

I downloaded the results from my GitHub commit history, renaming the files to indicating the number of people.

With each of the variants I’d tried, it is apparent that none below 1 million were appropriate. It is possible that there could have been an appropriate inbetween value that I hadn’t tried (e.g. 500,000), but for now, I will stick with 1 million.

import pandas as pd

display(pd.read_csv('65y_s1_tab2_1k.csv'))
display(pd.read_csv('65y_s1_tab2_10k.csv'))
display(pd.read_csv('65y_s1_tab2_100k.csv'))
display(pd.read_csv('65y_s1_tab2_1m.csv'))
display(pd.read_csv('65y_s1_tab2_2m.csv'))

	months	excess_dead_emer
0	6m	0 (NA)
1	12m	0 (NA)
2	24m	0 (NA)
3	36m	0 (NA)
4	48m	279 (NA)
5	60m	279 (NA)

	months	excess_dead_emer
0	6m	56 (0)
1	12m	0 (0)
2	24m	56 (0)
3	36m	56 (0)
4	48m	140 (0)
5	60m	196 (0)

	months	excess_dead_emer
0	6m	0 (6)
1	12m	0 (0)
2	24m	0 (0)
3	36m	5 (6)
4	48m	70 (25)
5	60m	114 (25)

	months	excess_dead_emer
0	6m	0 (3)
1	12m	0 (0)
2	24m	0 (1)
3	36m	21 (14)
4	48m	56 (35)
5	60m	108 (56)

	months	excess_dead_emer
0	6m	0 (0)
1	12m	0 (0)
2	24m	0 (0)
3	36m	20 (11)
4	48m	50 (31)
5	60m	105 (60)

15.00-15.59: Table 2

Modified function so it could be used to process both scenarios, then combined into a single table.

I was satisfied this was reproduced (within expected variation from running with less people) at 15.59.

import sys
sys.path.append('../')
from timings import calculate_times

# Minutes used prior to today
used_to_date = 243

# Times from today
times = [
    ('09.10', '09.11'),
    ('09.20', '09.22'),
    ('09.30', '09.40'),
    ('09.52', '10.44'),
    ('10.51', '10.56'),
    ('11.02', '11.56'),
    ('12.59', '13.27'),
    ('13.29', '13.39'),
    ('13.40', '13.44'),
    ('13.51', '13.59'),
    ('14.00', '14.06'),
    ('14.07', '14.29'),
    ('14.42', '14.49'),
    ('15.00', '15.59')]

calculate_times(used_to_date, times)

Time spent today: 268m, or 4h 28m
Total used to date: 511m, or 8h 31m
Time remaining: 1889m, or 31h 29m
Used 21.3% of 40 hours max

16.12-16.27: Fix later scenarios

Fixed surv scenarios 2 to 4e (although not yet running as potentially insufficient time to complete before end of work day). As above, fixes include:

Using parallel
1 million
Saving the results to csv
Commenting out list(v1other$aortaDiameterThresholds)
Timings for one run and all runs
Correcting filename when source DES_Model.R

Reflection

I haven’t modified the provided structure of seperate scripts, but this structure does make it challenging when you want to change a parameter across the runs, as you have to carefully change each of the scripts. It would be easier if they were controlled by a single set of shared parameters.

16.42-16.55: Start on Figure 2

Create function based on Figure 1, so can reuse same processing code to generate dataframe for the figure.

Timings

import sys
sys.path.append('../')
from timings import calculate_times

# Minutes used prior to today
used_to_date = 243

# Times from today
times = [
    ('09.10', '09.11'),
    ('09.20', '09.22'),
    ('09.30', '09.40'),
    ('09.52', '10.44'),
    ('10.51', '10.56'),
    ('11.02', '11.56'),
    ('12.59', '13.27'),
    ('13.29', '13.39'),
    ('13.40', '13.44'),
    ('13.51', '13.59'),
    ('14.00', '14.06'),
    ('14.07', '14.29'),
    ('14.42', '14.49'),
    ('15.00', '15.59'),
    ('16.12', '16.27'),
    ('16.42', '16.55')]

calculate_times(used_to_date, times)

Time spent today: 296m, or 4h 56m
Total used to date: 539m, or 8h 59m
Time remaining: 1861m, or 31h 1m
Used 22.5% of 40 hours max