Day 4

reproduce
Author

Amy Heather

Published

July 31, 2024

Note

Reproduced Figures 2-4, subset of Table 3, and ran some of the other scenarios. Total time used: 12h 21m (30.9%).

09.12-09.16: Run surv scenario 2

Update renv on remote machine, and then set surv scen2 to run.

Timings:

  • First run: 14.850 minutes = 891 seconds = 14 minutes 51 seconds
  • All runs: 99.651 minutes = 5979 seconds = 1 hour 39 minutes 39 seconds

09.20-09.58: Figure 2

Fixed dimensions of Figure 1 so it doesn’t change each time I re-run with different window widths. Then resumed work on Figure 2, adapting Figure 1 code to create functions that can be used to produce both plots.

Satisfied this is reproduced at 09.58.

import sys
sys.path.append('../')
from timings import calculate_times

# Minutes used prior to today
used_to_date = 539

# Times from today
times = [
    ('09.12', '09.16'),
    ('09.20', '09.58')]

calculate_times(used_to_date, times)
Time spent today: 42m, or 0h 42m
Total used to date: 581m, or 9h 41m
Time remaining: 1819m, or 30h 19m
Used 24.2% of 40 hours max
Reflection

Nice simple output from the model makes it easier to work with, as well as it being simple and similar plots with fairly minimal manipulation.

10.09-10.27: Figure 3 and Table 3

Using surv scen 1 (as in figure legend and article), starting working on Figure 3 and part of Table 3, adapting my prior functions.

I realised the output for scenario 1 doesn’t include a result from delay 0, so I will need the outputs from the base scenario scen0 to create this plot. I have not yet generated these.

This will also need different scaling. As in the Table 3 caption:

  • National surveillance cohort March 2020: n=15376
  • Expected AAA deaths in status quo: n=2152
  • Expected emergency operations in status quo: n=745

10.56-11.02: Set surv scenario 0 to run on remote machine

First had to push and pull changes, and then update surv scenario 0 R script (as we have done for others) (parallel, 1 million, csv, etc.)

Timings:

  • 4.463 minutes = 268 seconds = 4 minutes 28 seconds

11.06-11.07: Set surv scenario 3 to run on remote machine

Timings:

  • First run: 13.303 minutes = 798 seconds = 13 minutes 18 seconds
  • Full run: 81.663 minutes = 4900 seconds = 81 minutes 40 seconds

11.11-11.24: Returning to Figure 3 and a subset of Table 3

With scenario 0 results, I could finish making the subset for Table 3, and completed Figure 3.

Satisfied Figure 3 is reproduced at 11.24.

import sys
sys.path.append('../')
from timings import calculate_times

# Minutes used prior to today
used_to_date = 539

# Times from today
times = [
    ('09.12', '09.16'),
    ('09.20', '09.58'),
    ('10.09', '10.27'),
    ('10.56', '11.02'),
    ('11.06', '11.07'),
    ('11.11', '11.24')]

calculate_times(used_to_date, times)
Time spent today: 80m, or 1h 20m
Total used to date: 619m, or 10h 19m
Time remaining: 1781m, or 29h 41m
Used 25.8% of 40 hours max

11.31-11.55, 12.43-13.00: Working on Figure 4

Surv_scen2 includes s2.1 and s2.2, as it varies the dropout rate for 1 year, or for 2 years. These are each in part of Table 3, and together in Figure 4.

13.01-13.04: Set up surv scen 4a to run

Timings:

  • First run: forgot to note down
  • All runs: 35.390 min = 2123 seconds = 35 minutes 23 seconds

13.05-13.14: Resuming Figure 4

Had to combine with scenario 0 again to give result when dropout rate is 0.

Satisfied this is reproduced at 13.14.

import sys
sys.path.append('../')
from timings import calculate_times

# Minutes used prior to today
used_to_date = 539

# Times from today
times = [
    ('09.12', '09.16'),
    ('09.20', '09.58'),
    ('10.09', '10.27'),
    ('10.56', '11.02'),
    ('11.06', '11.07'),
    ('11.11', '11.24'),
    ('11.31', '11.55'),
    ('12.43', '13.00'),
    ('13.01', '13.04'),
    ('13.05', '13.14')]

calculate_times(used_to_date, times)
Time spent today: 133m, or 2h 13m
Total used to date: 672m, or 11h 12m
Time remaining: 1728m, or 28h 48m
Used 28.0% of 40 hours max

13.22-13.29: Subset of Table 3

Created the scenario 2 sections of table 3 using my prior functions.

Reflection

Given small size of these results dataframe, could have been handy to include them along with the code.

13.32-13.40: In-text result 1

The output from the model for Figure 3/Table 3 (..._surv_scen1.R) just returns total AAA deaths and not which sub-groups these belonged to.

I couldn’t spot any code in the repository that would extra this result by these sub-groups, so set about identifying how to do this.

13.53-13.55: Set up surv scen4b to run

Push and pull, and set to run.

Reflection

The original study authors inclusion of seeds and parallel processing is great and really handy.

13.56-14.00, 14.10-14.49, 16.07-16.16: Return to in-text result 1

I ran a simple version of the model on my machine (few people) so I could inspect the scen1.surv object returned.

scen1.surv$eventHistories is a list where each item is a person from the simulation. We can get:

  • Whether they died - if “aaaDeath” is in their events (scen1.surv$eventHistories[[i]]$screening$events)
  • Their aaorta size - scen1.surv$eventHistories[[i]]$screening$initialAortaSizeAsMeasured

I wrote a function that extracts the sizes for people who have died and counts numbers in each category. I add this to ..._surv_scen1.R, initially test running with a small number of people. This worked fine, so I then set it to run on the remote machine.

Timings:

  • One run: 13.346 minutes = 801 seconds = 13 minutes 21 seconds
  • All runs: 68.854 minutes = 4131 seconds = 1 hour 8 minutes 51 seconds

I imported the results, and then realised I would also need to run this function with ..._surv_scen0.R.

Timings

import sys
sys.path.append('../')
from timings import calculate_times

# Minutes used prior to today
used_to_date = 539

# Times from today
times = [
    ('09.12', '09.16'),
    ('09.20', '09.58'),
    ('10.09', '10.27'),
    ('10.56', '11.02'),
    ('11.06', '11.07'),
    ('11.11', '11.24'),
    ('11.31', '11.55'),
    ('12.43', '13.00'),
    ('13.01', '13.04'),
    ('13.05', '13.14'),
    ('13.22', '13.29'),
    ('13.32', '13.40'),
    ('13.53', '13.55'),
    ('13.56', '14.00'),
    ('14.10', '14.49'),
    ('16.07', '16.16')]

calculate_times(used_to_date, times)
Time spent today: 202m, or 3h 22m
Total used to date: 741m, or 12h 21m
Time remaining: 1659m, or 27h 39m
Used 30.9% of 40 hours max