Day 5

reproduce
Author

Amy Heather

Published

September 4, 2024

Note

Fixed my loading of epicR so get model results! Experiemented with agent numbers and started Table 3. Total time used: 6h 11m (15.5%)

09.19-09.41: Returning to run the model

Did a quick review of my prior logbook entries to get back up to speed. Will be trying to run Mini_Analysis.Rmd, which I had created previously and contains only scenario S1NoCD and saves the result to csv (whilst the full script Case_Detection_Results.Rmd contains many more scenarios. I had previously reduced the number of agents (settings$n_base_agents <-) to try and run on my local machine but this just returned 0/1/NaN, so I will now be trying with the full number of agents.

In Case_Detection_Results.Rmd, it sets settings$n_base_agents <- 50e+6 but has comment “change number of agents here: 100e+6 is used in the analysis”.

I decided to start with trying Mini_Analysis.Rmd with 50e+6. I logged into the remote computer, cloned the GitHub repository, then ran the following:

cd stars-reproduce-johnson-2021/reproduction
Rscript -e "renv::restore()"
Rscript -e "rmarkdown::render('scripts/Mini_Analysis.Rmd')"

This however returned an error that pandoc was required, so I ran sudo apt install pandoc then tried again. This then successfully rendered the .Rmd. I initially did 5000 agents to make sure it worked and output correctly, and then the original 50e+6 as mentioned.

11.29-11.34, 11.48-12.00: Reviewing results from 50e+6

This ran in 1.642896 hours (1 hour 38 minutes). The result was still wrong though:

import pandas as pd
pd.read_csv('mini_analysis_50e6.csv')
Scenario Agents PatientYears CopdPYs NCaseDetections DiagnosedPYs OverdiagnosedPYs SABA LAMA LAMALABA ... NoCOPD GOLD1 GOLD2 GOLD3 GOLD4 Cost CostpAgent QALY QALYpAgent NMB
0 S1NoCD 0 0 0 0 0 0 NaN NaN NaN ... 0 0 0 0 0 0 NaN 0 NaN NaN

1 rows × 29 columns

From the Case_Retection_Results.md in their repository, I can see that I should be getting numbers like 37195539 agents, 625946560 patient years, 71206641 CopdPYs, and so on.

I’m sceptical that increasing to 100e+6 would fix that issue, as their notebooks appear to have been run with 50e+6, as I did.

I double-checked the code in Mini_Analysis.md, but was satisfied that this matched up with theirs, with the only changes being to import epicR from the local folder, and to add a timer, and not having some processing of the table (but that would make no difference here, as numbers 0 and NaN).

I’m presuming the issue therefore is in the epicR package I have, but I’m rather unsure on where to start, or where the issue might be!

12.01-12.24: Basic run of epicR to resolve the “0” results issue

As I starting point, I tried following the instructions available for running epicR generally (and not this specific study), to see if I could get it working and printing some results for a random basic scenario.

On https://github.com/resplab/epicR, they give code for running epicR with default settings:

library(epicR)
init_session()
run()
Cget_output()
terminate_session()

And the default inputs viewed with:

input <- get_input()

However, I later realised this was a newer function in epicR and not in the version we have.

I loaded my local epicR package, although without warning=False, realised it had a warning:

Warning: ── Conflicts ───────────────────────────────────────────────────────────────────────────────── epicR conflicts
──
✖ `Cget_agent_events` masks `epicR::Cget_agent_events()`.
✖ `Cget_all_events` masks `epicR::Cget_all_events()`.
✖ `Cget_all_events_matrix` masks `epicR::Cget_all_events_matrix()`.
  … and more.
ℹ Did you accidentally source a file rather than using `load_all()`?
  Run `rm(list = c("Cget_agent_events", "Cget_all_events", "Cget_all_events_matrix", "Cget_event",
  "Cget_events_by_type", "Cget_inputs", "Cget_n_events", "Cget_output", "Cget_output_ex", "Cget_runtime_stats",
  "Cget_settings", "Cget_smith", "Cset_input_var", "Cset_settings_var", "get_sample_output", "mvrnormArma"))` to
  remove the conflicts.

I found that, if I ran the whole script just with the local devtools::load_all("../epicR"), it would appear to run but give results like 0. If however I cleared the environment and then ran library(epicR) below the load_all() statement, it would run and give actual results. I’m presuming this relates to the conflicts message above. I understand that, when I clear the environment, I’m removing objects, states and data that might have come with load_all() but keeping the package itself, so I am just left with a “clean slate” with the package.

12.25-12.30, 14.08-14.53, 15.10-15.14, 15.20-15.25: Fixing my mini analysis and returning to lower numbers of agents

I removed the Basic_Model.Rmd and returned to Mini_Analysis.Rmd, adding the new workflow of clearing the environment and reloading the library. I returned to running it with fewer numbers of agents. This output results - hurrah!

As I’d tried before, ran these (either on local machine or more high powered remote machine):

  • 5000 (5e3)
  • 50,000 (5e4) (17 seconds local, 7.2 seconds remote)
  • 500,000 (5e5) (2.6 minutes local, 1.0 minutes remote)
  • 1 million (1e6) (4.7 minutes local, 2.0 minutes remote)
  • 5 million (5e6) (27.3 minutes local, 10.0 minutes remote)

And it was quicker than before too! (17s vs 2.4m, 2.6m vs 18.2m).

import pandas as pd

# Import results
df5e3 = pd.read_csv('mini_analysis_fixed_5e3.csv').rename(index={0: '5e3'})

df5e4 = pd.read_csv('mini_analysis_fixed_5e4.csv').rename(index={0: '5e4'})

df5e5 = pd.read_csv('mini_analysis_fixed_5e5.csv').rename(index={0: '5e5'})

df1e6 = pd.read_csv('mini_analysis_fixed_1e6.csv').rename(index={0: '1e6'})

df5e6 = pd.read_csv('mini_analysis_fixed_5e6.csv').rename(index={0: '5e6'})

# Combine into a single dataframe and show
df_s1nocd = pd.concat([df5e3, df5e4, df5e5, df1e6, df5e6])
df_s1nocd
Scenario Agents PatientYears CopdPYs NCaseDetections DiagnosedPYs OverdiagnosedPYs SABA LAMA LAMALABA ... NoCOPD GOLD1 GOLD2 GOLD3 GOLD4 Cost CostpAgent QALY QALYpAgent NMB
5e3 S1NoCD 3696 6.233997e+04 7.457940e+03 18933 1.396640e+03 1387 0.017348 0.118369 0.148212 ... 52174 3045 3113 749 178 8.072027e+06 2183.990032 4.642803e+04 12.561696 625900.832686
5e4 S1NoCD 37129 6.235741e+05 7.094660e+04 190251 1.322909e+04 13394 0.017161 0.134433 0.152504 ... 525258 29237 30306 6618 1303 7.887546e+07 2124.362691 4.649150e+05 12.521615 623956.370519
5e5 S1NoCD 371579 6.251100e+06 7.066933e+05 1907271 1.310409e+05 133904 0.017192 0.136018 0.151516 ... 5270314 287267 304912 67938 12088 7.985673e+08 2149.118458 4.660359e+06 12.542042 624952.967972
1e6 S1NoCD 743128 1.249876e+07 1.421786e+06 3814132 2.622621e+05 267345 0.017119 0.135762 0.151370 ... 10529668 577647 613813 136603 24218 1.600396e+09 2153.593806 9.317379e+06 12.538055 624749.138617
5e6 S1NoCD 3718120 6.256653e+07 7.114655e+06 19087564 1.318087e+06 1336660 0.017180 0.135601 0.151460 ... 52712576 2886521 3074656 685563 119768 7.978907e+09 2145.951883 4.663988e+07 12.543942 625051.169501

5 rows × 29 columns

Can’t calculate ICER as need incremental costs and qalys, and those need S1a, S1b and S1c to make incrementalcosts, incrementalqaly and incrementalnmb, since they compare against S1NoCD.

Comparing against S1NoCD from Case_Detection_Results.md:

  • Some columns are different as their numbers depend on number of agents:
    • Agents
    • PatientYears
    • CopdPYs
    • NCaseDetections
    • DiagnosedPYs
    • OverdiagnosedPYs
    • Mild, Moderate, Severe, VerySevere
    • NoCOPD
    • GOLD1, GOLD2, GOLD3, GOLD4
    • Cost
    • QALY
  • But some are clearly processed and so should look at these to see if same results:
    • SABA
    • LAMA
    • LAMALABA
    • ICSLAMALABA
    • MildPY, ModeratePY, SeverePY, VerySeverePY
    • CostpAgent
    • QALYpAgent
    • NMB

I filtered to those columns, and add the results from Case_Detection_Results.md

# Filter to columns not influenced by agent number
s1 = df_s1nocd[[
  'SABA', 'LAMA', 'LAMALABA', 'ICSLAMALABA', 'MildPY','ModeratePY', 'SeverePY',
  'VerySeverePY', 'CostpAgent', 'QALYpAgent', 'NMB']]

# Add original results to dataframe
original = {
  'SABA': 0.017,
  'LAMA': 0.135,
  'LAMALABA': 0.151,
  'ICSLAMALABA': 0.080,
  'MildPY': 0.217,
  'ModeratePY': 0.041,
  'SeverePY': 0.068,
  'VerySeverePY': 0.006,
  'CostpAgent': 2150.057,
  'QALYpAgent': 12.544,
  'NMB': 625073.2
}
original_row = pd.DataFrame(original, index=['original'])
s1 = pd.concat([s1, original_row])

# View dataframe, rounding to 3dp (as they present it in original)
round(s1, 3)
SABA LAMA LAMALABA ICSLAMALABA MildPY ModeratePY SeverePY VerySeverePY CostpAgent QALYpAgent NMB
5e3 0.017 0.118 0.148 0.076 0.233 0.035 0.062 0.006 2183.990 12.562 625900.833
5e4 0.017 0.134 0.153 0.081 0.212 0.039 0.067 0.006 2124.363 12.522 623956.371
5e5 0.017 0.136 0.152 0.081 0.215 0.041 0.069 0.006 2149.118 12.542 624952.968
1e6 0.017 0.136 0.151 0.080 0.216 0.041 0.068 0.006 2153.594 12.538 624749.139
5e6 0.017 0.136 0.151 0.080 0.217 0.041 0.068 0.006 2145.952 12.544 625051.170
original 0.017 0.135 0.151 0.080 0.217 0.041 0.068 0.006 2150.057 12.544 625073.200

We can see that many columns look very similar between them. However, for MildPY, CostpAgent, QALYpAgent and NMB, the result 5e5 is markedly better than 5e4.

I then add 1 million (1e6) and 5 million (5e6) (already done above). And this looked pretty similar to 500,000 (5e5).

Looking at Case_Detection_Results.Rmd, there are 12 scenarios, and then another 12 for 5 years, so a total of 24.

I figured I could try running the whole thing with 500,000 to begin with, but up it to 1 million or 5 million if needed, but that 50 million probably wasn’t necessary (as anticipate will likely be able to get similar enough results with lower numbers).

15.26-15.38, 15.58-16.08, 16.18-16.53: Running the full original script

I re-copied the original script and only made the following amendments before running:

  • Altering import of epicR
  • Adding a timer
  • Reducing agent number from 50e6 (50 million) to 5e5 (500 thousand)
  • Saving results tables to csv files

Then ran on remote machine. It took a bit of tweaking to correct the saving into outputs folder and get right paths, as I got the path wrong, and then it wouldn’t work before had made “outputs” folder.

Finishing this, I got five results tables (saved to csv).

pd.read_csv('sall_5e5.csv').sort_values(by='Scenario')
Scenario Agents Cost CostpAgent QALY QALYpAgent ICER IncrementalNMB
6 S1NoCD 371579 7.985673e+08 2149.118458 4.660359e+06 12.542042 NaN 0.000000
2 S1a 371902 9.004813e+08 2421.286568 4.670660e+06 12.558846 16195.977030 568.065578
10 S1b 372140 8.821882e+08 2370.581549 4.668622e+06 12.545338 67189.317529 -56.657807
3 S1c 371919 8.780531e+08 2360.871744 4.668922e+06 12.553600 18321.203589 366.138021
7 S2NoCD 219759 5.923556e+08 2695.478273 2.696622e+06 12.270813 NaN 0.000000
1 S2a 221233 6.444787e+08 2913.121915 2.718304e+06 12.287065 13391.766206 594.958813
8 S3NoCD 172154 4.944031e+08 2871.865595 1.934054e+06 11.234440 NaN 0.000000
4 S3a 172660 5.270176e+08 3052.343287 1.941269e+06 11.243307 20354.541872 262.857493
5 S3b 172336 5.420539e+08 3145.331710 1.937179e+06 11.240712 43606.186231 40.097325
0 S3c 172314 5.350466e+08 3105.067680 1.939341e+06 11.254689 11516.850740 779.236513
9 S3d 172882 5.427277e+08 3139.295863 1.943060e+06 11.239224 55904.120380 -28.243723
pd.read_csv('ceplane_5e5.csv')
Scenario Agents PropAgents Cost CostpAgent CostpAgentExcluded CostpAgentAll QALY QALYpAgent QALYpAgentExcluded QALYpAgentAll IncrementalCosts IncrementalQALY ICERAdj ICER INMB
0 S1NoCDAvg 371714 1.000000 8.009440e+08 2154.731836 0.000000 2154.731836 4.662659e+06 12.543673 0.000000 12.543673 0.000000 0.000000 NaN NaN 0.000000
1 S1a 371902 1.000000 9.004813e+08 2421.286568 0.000000 2421.286568 4.670660e+06 12.558846 0.000000 12.558846 266.554732 0.015174 17566.697170 16195.977030 492.138634
2 S1b 372140 1.000000 8.821882e+08 2370.581549 0.000000 2370.581549 4.668622e+06 12.545338 0.000000 12.545338 215.849713 0.001665 129616.172631 67189.317529 -132.584751
3 S1c 371919 1.000000 8.780531e+08 2360.871744 0.000000 2360.871744 4.668922e+06 12.553600 0.000000 12.553600 206.139907 0.009927 20765.538289 18321.203589 290.211077
4 S2NoCD 219759 0.591205 5.923556e+08 2695.478273 1372.698365 2154.731836 2.696622e+06 12.270813 12.938285 12.543673 0.000000 0.000000 NaN NaN 0.000000
5 S2a 221233 0.595170 6.444787e+08 2913.121915 1372.698365 2289.512160 2.718304e+06 12.287065 12.938285 12.550698 134.780324 0.007026 19183.267805 13391.766206 216.516247
6 S3NoCD 172154 0.463136 4.944031e+08 2871.865595 1536.083584 2154.731836 1.934054e+06 11.234440 13.673105 12.543673 0.000000 0.000000 NaN NaN 0.000000
7 S3a 172660 0.464497 5.270176e+08 3052.343287 1536.083584 2240.381513 1.941269e+06 11.243307 13.673105 12.544471 85.649677 0.000799 107210.068856 20354.541872 -45.704885
8 S3b 172336 0.463625 5.420539e+08 3145.331710 1536.083584 2282.171665 1.937179e+06 11.240712 13.673105 12.545386 127.439829 0.001713 74374.399669 43606.186231 -41.765303
9 S3c 172314 0.463566 5.350466e+08 3105.067680 1536.083584 2263.411383 1.939341e+06 11.254689 13.673105 12.552009 108.679547 0.008337 13035.890064 11516.850740 308.167889
10 S3d 172882 0.465094 5.427277e+08 3139.295863 1536.083584 2281.728207 1.943060e+06 11.239224 13.673105 12.541121 126.996370 -0.002551 -49778.541469 55904.120380 -254.557732
pd.read_csv('clinicalresults_5e5.csv')
Scenario PropAgents ProppPatientYears ProppCopdPYs SABAAll LAMAAll LAMALABAAll ICSLAMALABAAll MildpAgentAll ModeratepAgentAll SeverepAgentAll VerySeverepAgentAll NoCOPDpPYAll GOLD1pPYAll GOLD2pPYAll GOLD3pPYAll GOLD4pPYAll DiagnosedpPYAll
0 S1NoCDAvg 1.000000 1.000000 1.000000 0.019067 0.136964 0.152488 0.081139 0.411860 0.077802 0.130921 0.011304 0.842808 0.046012 0.048997 0.010899 0.001925 0.187126
1 S1a 1.000000 1.000000 1.000000 0.026545 0.157022 0.308491 0.092651 0.399933 0.075006 0.126251 0.010954 0.841922 0.046363 0.049408 0.011021 0.001948 0.464262
2 S1b 1.000000 1.000000 1.000000 0.021459 0.149159 0.241721 0.088141 0.404149 0.077388 0.129586 0.011345 0.842348 0.046148 0.049331 0.010965 0.001861 0.333770
3 S1c 1.000000 1.000000 1.000000 0.020317 0.144204 0.214241 0.085843 0.405836 0.075828 0.127832 0.010900 0.843357 0.045779 0.048706 0.010946 0.001846 0.290201
4 S2NoCD 0.591205 0.580239 0.708846 0.019067 0.136964 0.152488 0.081139 0.411860 0.077802 0.130921 0.011304 0.842808 0.046012 0.048997 0.010899 0.001925 0.187126
5 S2a 0.595170 0.584381 0.711030 0.022459 0.150276 0.233361 0.090027 0.402526 0.077061 0.129553 0.010940 0.842892 0.046209 0.048552 0.011044 0.001959 0.315256
6 S3NoCD 0.463136 0.415130 0.587070 0.019067 0.136964 0.152488 0.081139 0.411860 0.077802 0.130921 0.011304 0.842808 0.046012 0.048997 0.010899 0.001925 0.187126
7 S3a 0.464497 0.416543 0.589477 0.020197 0.140746 0.182354 0.084633 0.411228 0.078085 0.130900 0.011011 0.842680 0.045921 0.049123 0.010895 0.002053 0.232586
8 S3b 0.463625 0.415283 0.585001 0.022672 0.148789 0.234355 0.088940 0.400451 0.076676 0.128972 0.011083 0.843023 0.045768 0.049076 0.010821 0.001914 0.318064
9 S3c 0.463566 0.415885 0.582899 0.021257 0.146624 0.211534 0.088250 0.404950 0.078611 0.131189 0.011164 0.843306 0.045817 0.048867 0.010668 0.001974 0.277774
10 S3d 0.465094 0.416844 0.590338 0.020732 0.144100 0.197982 0.085758 0.407402 0.078907 0.131801 0.011216 0.842570 0.045941 0.049306 0.010897 0.001906 0.255943

CEplot

I initially focused on sall, and reworking it to be more similar to Table 3. Whilst doing this, I assumed that S1NoCD was S0, as the values were closest and as it was the first (as opposed to S2NoCD or S3NoCD).

Timings

import sys
sys.path.append('../')
from timings import calculate_times

# Minutes used prior to today
used_to_date = 193

# Times from today
times = [
    ('09.19', '09.41'),
    ('11.29', '11.34'),
    ('11.48', '12.00'),
    ('12.01', '12.24'),
    ('12.25', '12.30'),
    ('14.08', '14.53'),
    ('15.10', '15.14'),
    ('15.20', '15.25'),
    ('15.26', '15.38'),
    ('15.58', '16.08'),
    ('16.18', '16.53')]

calculate_times(used_to_date, times)
Time spent today: 178m, or 2h 58m
Total used to date: 371m, or 6h 11m
Time remaining: 2029m, or 33h 49m
Used 15.5% of 40 hours max