Note
Working on research compendium stage.
Untimed: Research compendium
Parallel processing
Tried adding parallel processing in model.R
to speed it up
- Add
future.apply
to the environment plan(multisession, workers=max(availableCores()-5, 1))
future_lapply()
- However, it took longer than usual! So I removed it
Reorganising
- Moved scripts into a
scripts/
folder - Moved help functions from
reproduction.Rmd
into seperate R script (primarily so can reuse in tests more easily)
Fix image size
Set ggsave()
image width as realised it otherwise varied with window size when running
Tests
Create tests to check model results are consistent
- Started with creating a basic test saving tempfile csv and loading it to compare to another dataframe
- Then made a test with two example models being run for 3 replications and comparing results
- Then, set up with two files, as testthat can run files in parallel, and configured parallel processing. This involved:
- Adding
Config/testthat/parallel: true
to DESCRIPTION - Create project-specific environment file with
nano reproduction/.Renviron
and settingTESTTHAT_CPUS=4
- Adding
- Ran
testthat::test_dir("tests")
, although seemed to just run sequentially. Confirmed by checkingtestthat::isparallel()
which returnedFALSE
. - Tried adding
Config/testthat/start-first: shifts, model
toDESCRIPTION
and it ignored the order, so it appears the issue is it is not using info from theDESCRIPTION
file - Checked version and it is correct for running in parallel (testthat>=3.0.0)
- Tried instead running
testthat::test_local()
, and moving tests into a foldertestthat/
, and this returned an errorCould not find a root 'DESCRIPTION' file that starts with '^Package' in /home/amy/Documents/stars/stars-reproduce-huang-2019/reproduction.
- Changed
DESCRIPTION
to addPackage
and re-run - but this had error thatinstallation of renv failed
. Same error occurs if runtestthat::test_dir()
. It says toTry removing ‘/home/amy/.cache/R/renv/library/reproduction-0912b448/linux-ubuntu-jammy/R-4.4/x86_64-pc-linux-gnu/00LOCK-renv’
. I deleted this file (navigated there thanrm -r 00LOCK-renv
) then re-ran. However, this kept getting the same error message with that same file being created. - Tried removing
Package
fromDESCRIPTION
and runningtestthat::test_dir("tests/testthat", load_package="none")
- but that ignores the order inDESCRIPTION
- Tried
testthat::test_dir("tests/testthat", load_package="source")
which had error thatField 'Version' not found
. Once I had this and re-ran, it ran the tests in the specified order! FromConfig/testthat/start-first: shifts, model
- I then add in
Config/testthat/parallel: true
andConfig/testthat/edition: 3
but it had the samerenv
error as before - Then decided to just run without parallel for now, so removed those lines from
DESCRIPTION
, deleted the.Renviron
file, and put tests in a single file
Package: huang2019
Version: 0.1
Config/testthat/start-first: shifts, model
Config/testthat/parallel: true
Config/testthat/edition: 3
- Created function to simplify testing, then wrote tests fora selection of scenarios (not all scenarios, to minimise run time).
- Test was failing with error of
Length mismatch: comparison on first 2 components
. I tried changing fromexpect_equal
to usingall.equal()
and thenexpect_true(is_true())
on result. But this returned the same error! - I tried running everything manually in the console so I could inspect the dataframes myself.
file = "tests/testthat/expected_results/fig2_baseline.csv.gz"
exp <- as.data.frame(data.table::fread(file))
inputs=list(seed=200)
result <- do.call(run_model, inputs)
- I realised the issue was that the expected result included a column
shift
where value throughout was5pm
. This was likely due to changing it at some point but not having re-run the whole script since, so I did that (and timed it!). I removed some of the model variants that aren’t to produce results from the paper (E.g. varying seeds)- It takes a while to run and, midway through, the R session encountered a fatal error and aborted. Tried again, and it failed again on
exclusive_f5 <- run_model(exclusive_use = TRUE, seed = SEED, fig5=TRUE)
. - I’m suspecting this might be due to the size of the dataframes produced? So tried removing them from the environment after saving and ran again - but it still crashed, this time on the next
run_model()
statement - I considered trying again with parallelisation but, given I hadn’t had much luck with that before, and given that the issue here is with R crashing (and so parallelisation actually may not help), I decided to instead split up
reproduction.rmd
into a few smaller files. - I re-ran each of these in full, recording the run times.
- It takes a while to run and, midway through, the R session encountered a fatal error and aborted. Tried again, and it failed again on
Docker
Used the RStudio documentation and this tutorial to write a Dockerfile.