Day 3 – Reproducing Huang et al. 2019

Note

Set-up environment and run model. Total time used: 7h 23m (18.5%)

09.46-09.47: Set Python interpreter

Set Python interpreter (e.g. to render these from RStudio) by clicking on the project in the top right of RStudio, then selecting Project Options > Python, and selecting the quarto_huang_2019 virtual environment I’d set up.

09.48-10.21: Returning to troubleshooting R version

Continuing to look at the instructions for old R releases from yesterday:

“As of July 2023, packages for R versions below 4.0 are no longer being updated. R 3.6 packages for Ubuntu on i386 and amd64 are available for most stable Desktop releases of Ubuntu until their official end of life date. However, only the latest Long Term Support (LTS) release is fully supported. As of November 18, 2018 the supported releases are Bionic Beaver (18.04;LTS), Xenial Xerus (16.04; LTS), and Trusty Tahr (14.04; LTS).”

By running lsb_release -a, I can see that my linux version is jammy (22.04.4 LTS). Looking at the instructions from this Stackoverflow post, I’m a bit unclear as to whether I can use any of these if they’re for older versions of linux.

From this help post, I then stumbled across RStudio r-builds which has R builds that say they should install fast on Ubuntu from a .deb file and are designed to easily switch between multiple versions of R. These say they support Ubunutu 22.04. I ran:

R_VERSION=3.6.0
curl -O https://cdn.posit.co/r/ubuntu-2204/pkgs/r-${R_VERSION}_1_amd64.deb
sudo apt-get install gdebi-core
sudo gdebi r-${R_VERSION}_1_amd64.deb

I confirmed this was installed by running /opt/R/${R_VERSION}/bin/R --version.

I then followed their instructions to add R to the system path:

sudo ln -s /opt/R/${R_VERSION}/bin/R /usr/local/bin/R 
sudo ln -s /opt/R/${R_VERSION}/bin/Rscript /usr/local/bin/Rscript

I restarted RStudio and found I was now in R 3.6.0. I delete the renv (which was built with 4.4.1) and remade it.

renv::deactivate(clean=TRUE)
install.packages("renv")
renv::init(bare=TRUE)
renv::snapshot()

The lock file now had R 3.6.0 (previously 4.4.1) and renv 1.0.7.

10.40-11.30, 11.35-11.41: Installing the packages

I ran renv::install() but it failed with: Warning: failed to find source for 'simmer.plot 0.1.15' in package repositories. Error: failed to retrieve package 'simmer.plot@0.1.15'.

I then tried with remotes:

install.packages("remotes")
remotes::install_version("simmer", "4.2.2")

However, this failed like before. Instead, I decided a different tactic - to just download them without the specified versions. I removed the versions from DESCRIPTION and ran renv::install(). However, this stopped with an error: Error: package 'evaluate' is not available.

I then tried working through each package one by one.

renv::install("simmer") was successful.

renv::install("simmer.plot") failed with the issue of ‘evaluate’ is not available. Based on this StackOverflow post, I tried installing ‘stringi’ - but that didn’t end up helping. I tried install evaluate before and after restarting the R session but still stated as not available.

Uncertain on what else might fix this, I decided to actually just start again from the latest version of R and try installing the packages there and see if I could get it to work without backdating the packages. I closed RStudio and ran the commands as above but changed R_VERSION to 4.4.1. I also couldn’t run the commands for symbolic link as it said the files already exist. I restarted R but still 3.6.0. Looking in /opt/R/, I can see I now have 3.6.0 and 4.4.1.

Based on the prior tutorial I’d found, I tried:

export RSTUDIO_WHICH_R=/opt/R/4.4.1/bin/R
rstudio

This worked, although default when open from application bar was still set to 3.6.0. I tried changing the .profile file (nano .profile) to add export RSTUDIO_WHICH_R=/opt/R/4.4.1/bin/R but made no difference.

I tried forcing replacement of the symbolic links then reopening RStudio:

R_VERSION=4.4.1
sudo ln -snf /opt/R/${R_VERSION}/bin/R /usr/local/bin/R 
sudo ln -snf /opt/R/${R_VERSION}/bin/Rscript /usr/local/bin/Rscript

This worked! So, trying again (with DESCRIPTION file still containing no versions)…

renv::deactivate(clean=TRUE)
install.packages("renv")
renv::init(bare=TRUE)
renv::snapshot()
renv::install()
renv::snapshot()

11.41-11.46, 11.52-12.00: Try running the code

I copied over server.R and, on opening, it said that plyr and shiny were required but not installed, so I add these to the environment as well.

On reflection, I realised that the settings to only store dependencies from DESCRIPTION in renv.lock probably wouldn’t be great, in case hidden things were also installed, so changed this setting to “implicit” (which is default).

I ran the file and it did the command shiny, but said Error: object 'shiny' not found. I copied over all the files and tried again. It ran the script but nothing happened. Based on the Shiny documentation, I moved the files into a folder called app and run the following in R console:

library(shiny)
runApp("app")

This opened up a shiny app, but got an error “there is no package called ‘markdown’”. I add this to the environment and tried again.

This ran the app successfully.

However, from having looked at the app online, I knew that the figures it produced are not what I need to reproduce the results presented in the paper.

12.07-12.21, 13.00-13.21, 13.28-13.41: Getting the raw model results

I copied the function simulate_nav from server.R. Looking through it, there was only one part still using shiny - the progress bar - and I removed those lines of code, then add a call for the function at the end of the script (simulate_nav()), and ran it. This ran for a while, which was a little odd given how quick the app was.

I tried running it with a very short run time (simulate_nav(run_t=60)) and this returned results!

I borrowed from the plot_nav() function in server.R to help process the results.

I add the reproduction.Rmd to the Quarto site, but this had issues since the Quarto book renv is seperate to the analysis renv. Based on this forum post, there are two possible solutions:

Integrate the .html file produced from the .Rmd into the book, so it is pre-rendered, and set the .Rmd to Render on Save.
Add the packages needed for the book to the analysis renv.

However, it appears you’d have to copy the .html code into the Quarto document. So, decided on the simpler solution of adding the required packages for the book to the analysis environment. I deleted the environment in the main folder.

13.42-14.13: Checking model parameters

Comparing parameters

Table 1 provides the parameters for the model. I compared these against the function inputs (as the model have no comments/docstrings, it took a little while to make sure I was matching up the right things).

Physical and human resources:

Parameter	Paper	Script
Angiography machine for INR and IR	1	`angio_inr = 1`
Angiography machine for IR only	1	`angio_ir = 1`
CT	2	`ct = 2`
Interventional neuroradiologist	1 24h	`inr = 1` and `inr_night = 1`
Interventional radiologist	2 8am-5pm 1 5pm-8am	`ir = 1` and `ir_night = 1`
Angiography staff	6 8am-5pm 3 5pm-8am	`angio_staff = 10` and `angio_staff_night = 3`
ED team	10 24h	`ed_staff = 10`
Stroke team	1 24h	`stroke_staff = 1`

For the shifts parameter: shifts = c(8,17)

Patients:

Parameter	Paper N	Paper IAT	Script
ED	107,700	5	`ed_pt = 107000`
Suspected stroke	750	701	`st_pt = 750`
AIS	450	1168	`ais_pt = 450`
ECR	58	9062	`ecr_pt = 58`
Elective INR	104	5054	`inr_pt = 300`
Emergency IR	468	1123	`eir_pt= 1000`
Elective IR	3805	138	`ir_pt = 4000`

I also compared some other parameters mentioned in the paper:

Simulated each scenario 30 times - nsim = 1
Runtime 365 days - run_t = 10000

Correcting differences

Interventional neuroradiologist: inr_night = 0
Interventional radiologist: ir = 2
Angiography staff: angio_staff = 6
ED: ed_pt = 107700
Elective INR: inr_pt = 104
Emergency INR: eir_pt= 468
Elective IR: ir_pt = 3805
Replications: nsim=30
Run time…
- In the paper, run time is 365 days
- In the script, run_t = 10000 and RUN_T = run_t * 40320
- Deduced that time unit is minutes
- This is set up for the app, where user inputs the run time in months, and 40320 minutes = 28 days
- To more easily reproduce paper (with run time 365 days), modified script so input is in days (which are then converted to minutes for RUN_T)

14.22-14.27: Fixing environment

The build of the book on GitHub failed:

Configuration failed because libcurl was not found. Try installing:
 * deb: libcurl4-openssl-dev (Debian, Ubuntu, etc)
 * rpm: libcurl-devel (Fedora, CentOS, RHEL)
If libcurl is already installed, check that 'pkg-config' is in your
PATH and PKG_CONFIG_PATH contains a libcurl.pc file. If pkg-config
is unavailable you can set INCLUDE_DIR and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'

Based on this GitHub issue, add installation of this to the action.

14.29-15.11, 15.32-16.23: In-text results 1 and 2

The provided processing scripts may be able to help guide us, but not provided will create the Figures in the paper, so we do need to write that from scratch.

Although the article focuses on the AngioINR, the plot includes 6 resources. The resources are provided by simmer’s get_mon_resources().

We’ll start with Figure 2 and its scenarios - with the related in-text results 1 and 2 probbaly being the easiest to initially check.

The plot the angio INR wait times, which can be obtained from the arrivals dataframe.

I created a function that runs the model with the baseline parameters identified above, then looked to the Figure 2 model variants.

Exclusive use

In this scenario, AngioINR not available to elective IR patients. It is available to stroke, selective INR and emergency IR patients.

Looking at the model code, use of the angioINR is controlled by a seize("angio_inr", 1) statement in the trajectory() for each patient. Can see that it is seized in:

ecr_traj (ECR)
ir_traj (Elective IR)
inr_traj (Elective INR)
eir_traj (Emergency INR)

Hence, add an exclusive_use statement and conditional section to remove angio_inr as an option to choose from when the scenario is active for the ir_traj trajectory.

Two angioINRs scenario

This was super easy to change with the angio_inr parameter.

Check in-text results 1 and 2

Ran each of the scenarios and found the mean waiting time for each resource across all replications. Results for AngioINR were:

Baseline: 86.99 minutes
Exclusive use: 63.33 minutes
Two AngioINR: 52.78 minutes

This is markedly more than in the paper (exclusive reduces by 6 min, and two angioINRs reduces by 4 min). The median was even more different.

This was looking at the waiting time for all patient types though. And the interest of the paper is in stroke (“The elective INR, elective IR and emergency IR pathways are modeled because they utilize resources shared with the stroke pathway.” Huang et al. (2019))

The results have 4 categories: ed, ir, eir, and inr. Hence, it appears ED should be stroke - and indeed, in paper, the stroke pathway begins with a new patient in the emergency department (ED). When filtered just to those patients…

Mean wait times are:

Baseline 307.83 minutes
Exclusive: 292.18 minutes
Two AngioINR: 319.94 minutes

Median wait times for the AngioINR are:

Baseline: 206.53 minutes
Exclusive: 199.03 minutes
Two AngioINR: 244.27 minutes

By the median times, we’re fairly close to the paper (comparing the averages, it 7 minutes quicker). However, two angioINR is very different, and I’m a little sceptical as to whether I’ve got it quite right.

Reflections

Disregarding my attempts to backdate R and the packages, the provided code was actually quite simple to get up and running as a shiny app.

However, it was provided with the article more for that purpose, than to be producing the items in the article, as the base parameters of the model differ, and as there is no code to process and generate the figures and results.

I’ll keep working in latest R and packages, as current focus of this stage is just to try and reproduce the items. However, it would be good to try and figure out how to successfully backdate R and the packages, as that feels like an essential thing to be able to do, that I just hadn’t managed to get to the bottom of yet.

16.30-16.37, 16.43-16.48, 16.55-16.57: Continuing to troubleshoot in-text results 1 and 2

I realised that perhaps my issue as in incorrectly assuming that I should set inr_night to 0. There should be one INR person 24h, but because all the staff get put on a schedule, by removing the “night” person I technically only have someone during day time hours.

I changed this and re-ran (with timer - can see it took 6.3 minutes).

This was definitely a fix - the waiting times now look far closer to what I expected (around 10 minutes rather than 200!). The median results are very small, but looking at the mean results:

Baseline: 13.33 minutes
Exclusive: 8.58 minutes
Two AngioINR: 14.86 minutes

However, still not quite right… 4.7 minute reduction for exclusive (should be 6 minutes) and 1.5 minute increase for two machines (should be 4 minute reduction). The exclusive is pretty close, although its not yet clear to me how much it varies between runs, and whether that could be reasonably attributed to be stochasticity or not. Given we’re comparing fairly small numbers, it is a bit on the fence for me, and wouldn’t yet say I feel confident in it being successfully reproduced.

I tried re-running it all to see how much the results differed - and it was by a fair bit actually! Up to about a minute:

Baseline: 13.65 minutes
Exclusive: 9.20 minutes
Two AngioINR: 13.61 minutes

However, differences are still off the reported - 4.45 minute reduction and 0.01 minute reduction.

Timings

import sys
sys.path.append('../')
from timings import calculate_times

# Minutes used prior to today
used_to_date = 149

# Times from today
times = [
    ('09.46', '09.47'),
    ('09.48', '10.21'),
    ('10.40', '11.30'),
    ('11.35', '11.41'),
    ('11.41', '11.46'),
    ('11.52', '12.00'),
    ('12.07', '12.21'),
    ('13.00', '13.21'),
    ('13.28', '13.41'),
    ('13.42', '14.13'),
    ('14.22', '14.27'),
    ('14.29', '15.11'),
    ('15.32', '16.23'),
    ('16.30', '16.37'),
    ('16.43', '16.48'),
    ('16.55', '16.57')]

calculate_times(used_to_date, times)

Time spent today: 294m, or 4h 54m
Total used to date: 443m, or 7h 23m
Time remaining: 1957m, or 32h 37m
Used 18.5% of 40 hours max

References

Huang, Shiwei, Julian Maingard, Hong Kuan Kok, Christen D. Barras, Vincent Thijs, Ronil V. Chandra, Duncan Mark Brooks, and Hamed Asadi. 2019. “Optimizing Resources for Endovascular Clot Retrieval for Acute Ischemic Stroke, a Discrete Event Simulation.” Frontiers in Neurology 10 (June). https://doi.org/10.3389/fneur.2019.00653.