Day 2 – Reproducing Lim et al. 2020

Note

Reformat tables, consensus on scope, setup environment, first run of model. Total time used: 4h 37m (11.5%)

09.18-10.08, 10.18-11.38: Converting supplementary tables to CSV

Copied each table into LibreOffice Calc and saved as CSV.

As all tables had the same structure, I wrote a script that reformats them. This is so that they don’t “share” cells, and instead each row has all the information needed to identify it, making them easier to use and compare against later on.

Displayed these within the scope page.

Reflection

This took a longer than expected to reformat and sort, and so provision of tables in a “usable” format would have facilitated reproduction, although I understand that the tables that were provided are far more readable and easy to look at.

11.40-11.42: Updating README

Modifying from template.

11.48-11.55: Checking scope

Had another look over the paper to double check if I could spot anything that I think could be in scope.

Appears likely that the supplementary tables might actually often contain the data used to produce the figures, which is handy, as if I weren’t able to reproduce the results themselves, I would still be able to use that data to produce the figures (potentially (?) unless we would want to do it from scratch).

Did spot another result - “The strongest protective effect is seen with the N95 masks (nearly equivalent to a FFP2 mask), which has the effect of reducing the odds of transmission by 0.09.” - which, on reflection, might be an additional item for the scope, as the odds of transmission are not calculated/provided within the tables or figures.

All other text though seems to be interpreting the figures and tables presented, and not providing new results.

15.23-15.30, 15.38-15.40: Consensus on scope and archive repository

Messaged with Tom who had a look over the scope and agreed he was happy with it. Also, happy with not emailing authors again re: starting reproduction.

Alison also later confirmed she was likewise happy with the scope.

Update CHANGELOG.md and CITATION.cff before then creating a release to archive the repository on Zenodo.

15.41-15.42: Look over code

Concise / short model code, looks like it will run the model but will need to save results to csv and generate figures and tables with own code.

15.44-15.56: Set-up environment

The article mentions that code was in Python 3 and uses numpy and pandas. Looking at code, those do appear to be the only packages required that are not base (only other is random which is base). Looking at dates:

Article published online: 12 Sep 2020
Code put on GitHub: 29 Aug 2020

Hence, will base specific python version and package versions on being on or prior to 29 August 2020.

Python - 3.8 (first release 14 Oct 2019, then 3.9 came in Oct 2020)
Numpy - 1.19.1 (21 July 2020, https://pypi.org/project/numpy/#history)
Pandas - 1.1.1 (28 July 2020, https://pypi.org/project/pandas/#history)

Created an environment file and installed with mamba - mamba env create -f environment.yaml. Once activated enviroment, once activated with mamba activate lim2020, can see with conda list python that we have python 3.8.19. Checking online, can see should switch to 3.8.5, which is version from 20 July 2020.

15.57-15.58, 16.01-16.05, 16.15-16.33, 16.36-16.45, 17.15-17.16, 18.57-18.58: Run model

Copied Roster Schedule COVID Simulation-final.py and renamed to model.py. Created reproduction.ipynb to run model and then produce outputs, and add ipykernel to environment (required for .ipynb).

Then ran the model. Whilst it ran, it printed lots of SettingWithCopyWarning errors for stafflist['infected'][staff_infected[i]]=1 and stafflist['infected'][roster.iloc[0][0]]=1. (SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.) I anticipate these could be resolved with the following changes (which I’ll implement once it’s finished running):

stafflist.loc[staff_infected[i], 'infected'] = 1
stafflist.loc[roster.iloc[0][0], 'infected'] = 1

Wasn’t certain how long model should take, but paused record of time whilst it ran. After 10 minutes though, I decided to cancel that, and instead just run one version of the input parameters for the model (i.e. one choice from staff per shift, shifts per day, strength, staff change).

I copied over the functions into the .ipynb and ran one version of model. Can see this outputs a table where columns are the strength and staff change. I tried running a few to understand how the results table was being produced. I then add code to save the results to .csv.

Presently, I think this might only be for one shift per day, although I might be wrong.

Once home, set model to run the full loop from the .py script. This took 19 minutes 8 seconds.

Reflection

Model run time does have an impact on how easy it is to rerun (or then reuse) a model.

Timings

import sys
sys.path.append('../')
from timings import calculate_times

# Minutes used prior to today
used_to_date = 82

# Times from today
times = [
    ('9.18', '10.08'),
    ('10.18', '11.38'),
    ('11.40', '11.42'),
    ('11.48', '11.55'),
    ('15.23', '15.30'),
    ('15.38', '15.40'),
    ('15.41', '15.42'),
    ('15.44', '15.56'),
    ('15.57', '15.58'),
    ('16.01', '16.05'),
    ('16.15', '16.33'),
    ('16.36', '16.45'),
    ('17.15', '17.16'),
    ('18.57', '18.58')]

calculate_times(used_to_date, times)

Time spent today: 195m, or 3h 15m
Total used to date: 277m, or 4h 37m
Time remaining: 2123m, or 35h 23m
Used 11.5% of 40 hours max