Day 3

reproduce
Author

Amy Heather

Published

July 30, 2024

Note

Reproduced Figure 1 and Table 2. Total time used: 8h 59m (22.5%).

Timings below: Working on Table 2 and Figure 1

Timings:

  • 09.10-09.11
  • 09.20-09.22
  • 09.30-09.40
  • 09.52-10.44
  • 10.51-10.56
  • 11.02-11.56
  • 12.59-13.27

GitHub repository dates

I realised I had forgotten to check the GitHub repository dates v.s. paper dates. The article was published in June 2021. There are three commits after this point:

  • 19 July 2021 - “Adding SWAN models to repository” - relevant changes:
    • debug=F and some error handling in processPersons() from DES_Model.R
    • Changing from v1other$aortaDiameterThresholds <- c(3.0, 4.5, 5.5) to v1other$aortaDiameterThresholds <- list(c(3.0, 4.5, 5.5)) in input/NAAASP_Men_2020-05-11/DES_Data_Input_NAAASP_Men_30years_time_horizon_2020-05-11.R
    • Removing an associated .xlsx file for 202-05-11
  • 19 July 2021 - “Updating README.md”
    • More detail in README (see below)
  • 12 February 2024 - “Fixed checks with vectorised elements, which was causing errors in R v4.3”
    • Add .gitignore
    • Modification in Auxillary_Functions.R
    • Formatting in DES_Model.R

Original:

The following directories are included:

/models – R scripts that run the DES models for AAA screening

/models/Example – An example script to run the DES is contained here

/functions – Contains the DES model code

/input – Contains the input parameters and .csv files needed to run the DES models

/input/NAAASP_Men_2020-05-11 – Updated parameters for AAA screening in men, updated as of 11/05/2020

/output – Directory where Rdata output files are saved

New:

The following directories are included:

  • /models – R scripts that run the DES models for AAA screening

  • /models/Example – An example script to run the DES is contained here

  • /models/SWAN – Screening Women for Abdominal Aortic Aneurysm (SWAN) model scripts. See our Lancet publication and HTA report for further details

  • /models/NAAASP_COVID_modelling – Scripts for modelling the impact of changes to Abdominal Aortic Aneurysm screening and treatment services in England during the COVID-19 pandemic. See our PLOS ONE publication for further details

  • /functions – Contains the DES model code

  • /input – Contains the input parameters and .csv files needed to run the DES models

  • /input/SWAN – Screening Women for Abdominal Aortic Aneurysm (SWAN) input parameters. See our Lancet publication and HTA report for further details

  • /input/NAAASP_COVID_modelling – Input parameters for modelling the impact of changes to Abdominal Aortic Aneurysm screening and treatment services in England during the COVID-19 pandemic. See our PLOS ONE publication for further details

  • /input/NAAASP_Men_2020-05-11 – Updated parameters for AAA screening in men, updated as of 11/05/2020

  • /output – Directory where Rdata output files are saved

Reflection

Given how far I have progressed with this, I won’t go back to using the version as of publication, but we should bare this in mind later.

Incrementing number of people for 65yo scen1

Ran full script of run_aaamodel-65yo_scen1.R with 100,000 people and parallel. However, this evidently is still too few people, looking at the results.

import pandas as pd

pd.read_csv('tab2_100k.csv')
delayscr excess_death excess_op
0 0.00 0 0
1 0.25 -6 -3
2 0.50 -5 3
3 1.00 -6 -1
4 2.00 -1 -3
5 3.00 2 3
6 4.00 25 7
7 5.00 41 7

I pulled changes to files and renv onto remote computer, and then ran that with 1,000,000 people. To open and change file:

nano 'models/NAAASP_COVID_modelling/run_aaamodel_65yo_scen1.R'

To run:

Rscript -e "source('models/NAAASP_COVID_modelling/run_aaamodel_65yo_scen1.R')"

Per run this took:

  • 3 minutes 46 seconds on remote computer - so estimated 30 minutes in total.
  • 13 minutes 44 seconds on local computer - so would be an estimated 1 hour 50 minutes in total

Looking at the convergence plots (supplementary figure 1 and 2), 1 million people is nearing closer to convergence - but 2 million even more-so.

import pandas as pd

pd.read_csv('tab2_1m.csv')
delayscr aaadead emeropen excess_death excess_op
0 0.00 9226 2830 0 0
1 0.25 9200 2824 -26 -6
2 0.50 9194 2843 -32 13
3 1.00 9199 2830 -27 0
4 2.00 9210 2836 -16 6
5 3.00 9298 2878 72 48
6 4.00 9423 2933 197 103
7 5.00 9612 2993 386 163

Scaling the numbers

Looking at the result, I’m a little uncertain over the numbers we are getting. Looking at Table 2’s caption, I’m thinking that perhaps I may need to scale these (as using the raw numbers from the simulation, we’d expect these to be less than the table). The caption mentions that the:

  • National male 65 year old cohort for England: n = 279,798
  • Expected AAA deaths over 30y in status quo = 2564
  • Expected emergency operations over 30y in status quo = 1041

I tried scaling the results so that it reflects deaths expected if population were 279,798 (rather than 1 million). I scaled the results from aaadead and emerevar (e.g. round(279798*(aaadead/1000000))). The number of deaths looks similar to the expected from Table 2 (2564), but the number of emergency operations is very different.

import pandas as pd

pd.read_csv('scaled_1m.csv')
delayscr aaadead scaled_dead scaled_emer
0 0.00 9226 2581 261
1 0.25 9200 2574 261
2 0.50 9194 2572 261
3 1.00 9199 2574 261
4 2.00 9210 2577 261
5 3.00 9298 2602 261
6 4.00 9423 2637 261
7 5.00 9612 2689 261

Scenario 0

I was wondering whether this should be compared against the result from the status quo script rather than from when delay is 0 (or whether that should just be the same thing anyway!).

I repeated this for the results from scenario 0 (status quo):

scale_dead_s0 <- y65_s0 %>%
  select(aaadead) %>%
  mutate(scaled_dead = round(279798*(aaadead/1000000)))
write.csv(scale_dead_s0, "../../logbook/posts/2024_07_30/dead_s0.csv", row.names=FALSE)

It came out exactly the same, so seems, no issue with using the 0 result from the run of scen1.

import pandas as pd

pd.read_csv('dead_s0.csv')
aaadead scaled_dead
0 9226 2581

Confirming which column gives number of operations

Given that number of emergency operations looked very different, I was suspicious if I was using the correct column for this. It seems clear that deaths and ruptures are aaadead and rupt respectively, but I was less certain for surgeries (elecopen vs elecevar) (emeropen vs emerevar).

I tried applying scaling to every column from row 1, to see which came out most similar.

as.data.frame(t(head(y65_s1, 1))) %>%
  rename(result = 1) %>%
  mutate(scaled = round(279798*(result/n_person)))

Based on the names, there are a few candidates for emergency operations: emerevar, emeropen, reintemerevar and reintemeropen. I’m not certain what the difference is between these, but starting with their scaled values:

  • emerevar - 242
  • emeropen - 790
  • reintemerevar - 935
  • reintemeropen - 183

The expected number is 1041, so reintemerevar does appear closest.

import pandas as pd

pd.read_csv('scale_all_first_row.csv', index_col=0)
result scaled
n 1000000.00 279798
delayscr 0.25 0
inv 997073.00 278979
scr 748352.00 209387
reinv 135543.00 37925
nonatt 248721.00 69592
monitor 265786.00 74366
dropout 9445.00 2643
oppdet 28438.00 7957
consult 25555.00 7150
elecevar 10721.00 3000
elecopen 4739.00 1326
rupt 10093.00 2824
emerevar 865.00 242
emeropen 2824.00 790
reintelecevar 3340.00 935
reintemerevar 3340.00 935
reintemeropen 655.00 183
aaadead 9200.00 2574
nonaaadead 909861.00 254577

I also then looked into the repository, for any written explanation for each variable, or the code behind it, to help confirm which is appropriate. Looking across the repository, these columns are created within each of the scenario scripts. From run_aaamodel_65yo_scen1.R:

elecevar<-Eventsandcosts(scen1.invite)[14,2]
elecopen<-Eventsandcosts(scen1.invite)[15,2]

emerevar<-Eventsandcosts(scen1.invite)[17,2]
emeropen<-Eventsandcosts(scen1.invite)[18,2]

reintelecevar<-Eventsandcosts(scen1.invite)[21,2]
reintemerevar<-Eventsandcosts(scen1.invite)[21,2]

Looking directly at the output of Eventsandcosts(scen1.invite) I can see that these match up with:

  • 14 - electiveSurgeryEvar
  • 15 - electiveSurgeryOpen
  • 17 - emergencySurgeryEvar
  • 18 - emergencySurgeryOpen
  • 21- reinterventionAfterElectiveEvar

I’m not certain why reintemerevar uses 21 and not 22, as 22 is reinterventionAfterEmergencyEvar, but this appears to be the same in each of the scenario scripts.

These are assigned from costs, and v2$costs is in the data dictionary, DES_Input_Definitions.xlsx:

  • electiveSurgeryEvar: Elective EVAR repair
  • electiveSurgeryOpen: Elective Open repair
  • emergencySurgeryEvar: Emergency EVAR repair
  • emergencySurgeryOpen: Emergency Open repair
  • reinterventionAfterElectiveEvar: Re-intervention after elective EVAR

Having looked at this, I am presuming that the number of emergency operations would be given by combining emergencySurgeryEvar (17) (emerevar) and emergencySurgeryOpen (18) (emeropen).

Once I did that, the numbers looked much closer to the expected 1041 from the paper, with 1034 from 1 million people, and 1028 from 2 million people.

Excess deaths

I’m seeing some negatives (but the paper does not, and has several 0). I’m wondering if, as the result is “excess” deaths, perhaps I should only be counting when it was over 0 (and setting negative numbers to 0), so I altered it accordingly.

I also trimmed down the rows displayed for Table 2, to make the article.

Two million people

I ran run_aaamodel_65yo_scen1.R again but with 2 million people, as that looks like a more appropriate figure from the convergence plots, but I wasn’t certain if:

    1. the remote computer would manage it (process killed when ran with 10 million)
    1. how long it would take
    1. how much of a difference it would make

RE: (a) - it did manage it. While running, I did note we are now getting very large objects from each run (e.g. processPersons is about to return an object of size 4.5 Gb)

RE: (b) - the time for one run with 2 million people was 11 minutes 20 seconds. For all eight runs, it took a total of “1.01512100968096” (which is weird - normally it gives me a result in minutes). If we assume this is hours, it’s just over an hour. My estimate based on the time for one run would be 1 hour 30 minutes. Have changed the timing statement to use difftime instead with units specified to avoid this issue in future.

RE: (c) - the results are pretty similar. I renamed the output files to we had results with 1 million and 2 million, then applied same function to create table 2, and they were pretty similar.

import pandas as pd

display(pd.read_csv('tab2_correct_1m.csv'))
display(pd.read_csv('tab2_correct_2m.csv'))
months excess_dead excess_emer
0 6m 0 3
1 12m 0 0
2 24m 0 1
3 36m 21 14
4 48m 56 35
5 60m 108 56
months excess_dead excess_emer
0 6m 0 0
1 12m 0 0
2 24m 0 0
3 36m 20 11
4 48m 50 31
5 60m 105 60

Given their similarity, I will continue with running the simulation with 1 million people for the time being, since running with 2 million was much longer and very computationally expensive.

13.29-13.39: Running 65 year old scenario 2

Scenario 2 has a delay to invitation of 6 months, and attendance of 45 to 75%. This is required for Table 2, which also displays results from attendance of 65%, 55% and 45% (compared against 75%).

I set this up to run on the remote computer, with the changes as before, of:

  • Using parallel
  • 1 million
  • Saving the results to csv
  • Commenting out list(v1other$aortaDiameterThresholds)
  • Timings for one run and all runs

Timings:

  • One run: 3.952 minutes = 237 seconds = 3 minutes 57 seconds
  • Full run: 19.123 minutes = 1147 seconds = 19 minutes 7 seconds

14.00-14.06: Running surv scenario 1

Midway through creating figure 1, I set remote computer to run another scenario.

Amended script as did for 65yo scen2, with an additional fix too - correct path to model script to DES_Model.R - and then ran on remote machine.

Timings:

  • One run: 13.368 minutes = 802 seconds = 13 minutes 22 seconds
  • Full run: 70.183 minutes = 4211 seconds = 1 hour 10 minutes 11 seconds

This runtime reaffirmed for me the use of 1,000,000 people (rather than 2,000,000).

13.40-13.44, 13.51-13.59, 14.07-14.29: Creating Figure 1

Whilst that ran - and then, setting others to run during this too (hence time jumps) - I set about creating Figure 1 from the results of scenario 1 with 1 million people.

Add ggplot2 (to make plots) and tidyr (to melt dataframe) to renv.

Wrote code to create figure, and was satisified this was reproduced at 14.29.

import sys
sys.path.append('../')
from timings import calculate_times

# Minutes used prior to today
used_to_date = 243

# Times from today
times = [
    ('09.10', '09.11'),
    ('09.20', '09.22'),
    ('09.30', '09.40'),
    ('09.52', '10.44'),
    ('10.51', '10.56'),
    ('11.02', '11.56'),
    ('12.59', '13.27'),
    ('13.29', '13.39'),
    ('13.40', '13.44'),
    ('13.51', '13.59'),
    ('14.00', '14.06'),
    ('14.07', '14.29')]

calculate_times(used_to_date, times)
Time spent today: 202m, or 3h 22m
Total used to date: 445m, or 7h 25m
Time remaining: 1955m, or 32h 35m
Used 18.5% of 40 hours max

14.42-14.49: Checking scenario 1 simulation table 2 results

Now that we have the function to process scenario 1 correctly, I went back to re-run the results from fewer numbers of people, just to reaffirm whether the chosen number is appropriate (or whether we could use fewer).

I downloaded the results from my GitHub commit history, renaming the files to indicating the number of people.

With each of the variants I’d tried, it is apparent that none below 1 million were appropriate. It is possible that there could have been an appropriate inbetween value that I hadn’t tried (e.g. 500,000), but for now, I will stick with 1 million.

import pandas as pd

display(pd.read_csv('65y_s1_tab2_1k.csv'))
display(pd.read_csv('65y_s1_tab2_10k.csv'))
display(pd.read_csv('65y_s1_tab2_100k.csv'))
display(pd.read_csv('65y_s1_tab2_1m.csv'))
display(pd.read_csv('65y_s1_tab2_2m.csv'))
months excess_dead_emer
0 6m 0 (NA)
1 12m 0 (NA)
2 24m 0 (NA)
3 36m 0 (NA)
4 48m 279 (NA)
5 60m 279 (NA)
months excess_dead_emer
0 6m 56 (0)
1 12m 0 (0)
2 24m 56 (0)
3 36m 56 (0)
4 48m 140 (0)
5 60m 196 (0)
months excess_dead_emer
0 6m 0 (6)
1 12m 0 (0)
2 24m 0 (0)
3 36m 5 (6)
4 48m 70 (25)
5 60m 114 (25)
months excess_dead_emer
0 6m 0 (3)
1 12m 0 (0)
2 24m 0 (1)
3 36m 21 (14)
4 48m 56 (35)
5 60m 108 (56)
months excess_dead_emer
0 6m 0 (0)
1 12m 0 (0)
2 24m 0 (0)
3 36m 20 (11)
4 48m 50 (31)
5 60m 105 (60)

15.00-15.59: Table 2

Modified function so it could be used to process both scenarios, then combined into a single table.

I was satisfied this was reproduced (within expected variation from running with less people) at 15.59.

import sys
sys.path.append('../')
from timings import calculate_times

# Minutes used prior to today
used_to_date = 243

# Times from today
times = [
    ('09.10', '09.11'),
    ('09.20', '09.22'),
    ('09.30', '09.40'),
    ('09.52', '10.44'),
    ('10.51', '10.56'),
    ('11.02', '11.56'),
    ('12.59', '13.27'),
    ('13.29', '13.39'),
    ('13.40', '13.44'),
    ('13.51', '13.59'),
    ('14.00', '14.06'),
    ('14.07', '14.29'),
    ('14.42', '14.49'),
    ('15.00', '15.59')]

calculate_times(used_to_date, times)
Time spent today: 268m, or 4h 28m
Total used to date: 511m, or 8h 31m
Time remaining: 1889m, or 31h 29m
Used 21.3% of 40 hours max

16.12-16.27: Fix later scenarios

Fixed surv scenarios 2 to 4e (although not yet running as potentially insufficient time to complete before end of work day). As above, fixes include:

  • Using parallel
  • 1 million
  • Saving the results to csv
  • Commenting out list(v1other$aortaDiameterThresholds)
  • Timings for one run and all runs
  • Correcting filename when source DES_Model.R
Reflection

I haven’t modified the provided structure of seperate scripts, but this structure does make it challenging when you want to change a parameter across the runs, as you have to carefully change each of the scripts. It would be easier if they were controlled by a single set of shared parameters.

16.42-16.55: Start on Figure 2

Create function based on Figure 1, so can reuse same processing code to generate dataframe for the figure.

Timings

import sys
sys.path.append('../')
from timings import calculate_times

# Minutes used prior to today
used_to_date = 243

# Times from today
times = [
    ('09.10', '09.11'),
    ('09.20', '09.22'),
    ('09.30', '09.40'),
    ('09.52', '10.44'),
    ('10.51', '10.56'),
    ('11.02', '11.56'),
    ('12.59', '13.27'),
    ('13.29', '13.39'),
    ('13.40', '13.44'),
    ('13.51', '13.59'),
    ('14.00', '14.06'),
    ('14.07', '14.29'),
    ('14.42', '14.49'),
    ('15.00', '15.59'),
    ('16.12', '16.27'),
    ('16.42', '16.55')]

calculate_times(used_to_date, times)
Time spent today: 296m, or 4h 56m
Total used to date: 539m, or 8h 59m
Time remaining: 1861m, or 31h 1m
Used 22.5% of 40 hours max