Note
Continued working on research compendium (online coding environment, documentation).
Work log
Research compendium
Accessing the repository on browser without any installs
Explored options for virtual hosting of materials…
Google Colab:
- Simply add ‘tocolab’ within the url so it becomes ‘githubtocolab.com/’ - e.g. https://githubtocolab.com/pythonhealthdatascience/stars-reproduce-allen-2020/blob/main/reproduction/reproduction.ipynb. This has a few issues though…
- Won’t have the environment set up, so you’ll need to pip install dependencies within this script.
- Can’t use local materials (e.g. the local sim package) without uploading those files to Google Drive
- Have to “hack” it to correct version of Python
!python3 --version
!sudo apt-get install python3.8
!sudo apt-get update -y
!sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.8 1
!sudo update-alternatives --config python3
!python3 --version
BinderHub:
- Created BinderHub config file to use the GHCR image (
reproduction/binderhub/config.yaml
)
config:
BinderHub:
use_registry: true
image_prefix: ghcr.io/pythonhealthdatascience/covid19:latest
- Then set up on <mybinder.org>. This set up but had some issues:
- Takes a long time to set-up.
- Test-run of reproduction.ipynb failed - said there was “no module named ‘simpy’”.
- Displayed whole repository (rather than just the reproduction folder) - but looks like this is not supported by binder as an option - closest is to modify url so it points to a particular notebook - https://mybinder.org/v2/gh/pythonhealthdatascience/stars-reproduce-allen-2020/HEAD?labpath=reproduction%2Freproduction.ipynb.
- Tried simplifying this (as Binder recommends against Docker), so removed the config file and instead copied the environment file into a .binder folder in the root, as then Binder should ignore configuration files elsewhere…
- Requires the
.binder/
folder to be in the root (so can’t be within the reproduction folder) - Took 3 minutes 20s to set up
- Test-run of reproduction.ipynb failed - said there was “no module named ‘pandas’”, and
!pip list
shows that expected packages from environment.yaml haven’t been installed - Check of python version (
!python --version
) returns 3.10.14 (incorrect)
- Requires the
- Tried a final time, creating a
requirements.txt
instead with packages, andruntime.txt
with the Python version within the.binder/
folder…- Took 3m 25s to set up
- When attempting to test it, connection to the kernel failed and it could not run the cells (“A connection to the notebook server could not be established. The notebook will continue trying to reconnect. Check your network connection or notebook server configuration.”). This persisted after a restart.
Requirements:
joblib==0.14.1
jupyterlab==1.2.6
matplotlib==3.1.3
notebook==6.0.3
numpy==1.18.1
pandas==1.0.1
pip==20.0.2
pytest==5.3.5
scipy==1.4.1
seaborn==0.10.0
simpy==3.0.11
statsmodels==0.11.0
Runtime:
python-3.8.1
Conclusion: Decided that for the purposes of these reproduction assessments, to not continue pursuing this at the current moment in time, and to just keep with the options of the environment file or the Binder image (locally or from GHCR).
Other aspects of research compendium
README:
- Add operating system (found by running
cat /etc/os-release
,lscpu
andfree -g
(to see RAM in GB)) - Recorded runtime (with measurement of time elapsed added to the notebook)
- Add model summary (copied from final report, with addition of the pathway figure)
- Add reproduction scope (copied from
scope.qmd
) - Created repository overview
- Tidied instructions on environment set up and running the model
- Add some extra details on citation and license.
Notebook:
- Made some minor additions, but primarily referred back to the
README.md