Day 3 – Reproducing Shoaib and Ramamohan 2022

Note

Succesful reproduction of Table 6, and start on figure 2. Total time used: 13h 49m (34.5%)

Any references to the original study are to Shoaib and Ramamohan (2021) which is licensed under CC BY-NC-ND.

9.20-10.03, 10.11-10.22, 10.31-11.43: Different configurations

Got parameters for each configuration from Table 3. Some discrepancies:

Configuration 4 (benchmark) is stated as 4 nurses in the table, but in the text, is described as having 5 nurses (1 NCD and 4 staff) - although the staff nurses are described as working alone in consecutive 8 hour shifts, which would work out to 3 nurses, so it is assumed that the standard configuration of 3 staff nurse and 1 NCD nurse (and so 4 nurses overall, as in the table) is used.
Table 3 caption describes all configurations as having 6 inpatient and 1 labour bed - whilst Table 6 describes configuration 3 as having 0 labour room beds. However, as there is no labour arrivals, this has no impact on results, so just left as default 1 bed as per Table 3.

Configuration 3 has no arrivals for childbirth or ANC, and so Table 3 states their inter-arrival time as NaN. However, it was unclear how to set this up in the model code.

Trying to figure how best to re-run the PHC.py script given it is set up with global variables. Based on this StackOverflow post, tried out creating all global parameters using a single class, so that - if you want to re-run and reset all global parameters - you just make a new instance of this class. Like this -

DEFAULT_OPD_IAT = 4
DEFAULT_DELIVERY_IAT = 1440
DEFAULT_IPD_IAT = 2880
DEFAULT_ANC_IAT = 1440


class ModelState:
    """
    Class containing all global parameters (so that, if you wish to reset
    values, you can do so easily by creating a new instance of the ModelState)
    """
    def __init__(self,
                 OPD_iat=DEFAULT_OPD_IAT,
                 delivery_iat=DEFAULT_DELIVERY_IAT,
                 IPD_iat=DEFAULT_IPD_IAT,
                 ANC_iat=DEFAULT_ANC_IAT):
        """
        Parameters for the simulation model.

        Parameters:
        -----------
        OPD_iat : int
            Inter-arrival time for OPD patients

        delivery_iat : int
        `   Inter-arrival time for labour patients

        IPD_iat : int
            Inter-arrival time for IPD patients

        ANC_iat : int
            Inter-arrival time for ANC patients
        """
        self.OPD_iat = OPD_iat
        self.delivery_iat = delivery_iat
        self.IPD_iat = IPD_iat
        self.ANC_iat = ANC_iat


c1_state = ModelState()
c1_state.__dict__

c2_state = ModelState()
c2_state.OPD_iat = 9
c2_state.__dict__

c3_state = ModelState(OPD_iat=5)
c3_state.__dict__

But then reflected that this was changing the code alot, and I wasn’t 100% confident on whether it would definitely deal with all the global variables correctly (as they are repeatedly defined throughout the script).

Chat with Tom and he suggested:

dir() should show global variables - but these just appear to be for notebook, so hopefully no overlap, but to check -
Return the value of the global variables after running the model. Then run the model again, and check the values are the same
He also suggested, for when there are no arrivals for two patient types, the simplest way to set this up is to set the inter-arrival time to something really really large (e.g. 10,000,000 * run time)

I created a function to return global values:

def return_globals():
    """
    Return global variables and their values from workspace. Used for
    verifying the environment is as expected.

    Returns:
    -------
    globals() : dict
        Dictionary of global variables and their values
    """
    return globals()

And then checked if the output was consistent between different calls of the model, and this returned True, so will assume that running the model in this way is not leading to results interefering with each other between runs. See GitHub commit 16f50f7 for notebook with these results.

12.00-12.05, 12.17-12.57: Different configurations (continued)

Ran the four different configurations (excluded run time from timings).

Kept setting IAT for ANC higher as was getting an average of 1 arrival (but want to be getting no arrivals), but even with IAT of 999999999999999999999999999999999*365*60*24, was still getting 1 ANC arrival with every run. Tried setting ANC() and Delivery() to comments in PHC.py, and this worked to prevent arrivals, so add True/False inputs to main() for whether these functions are called, instead of setting high IAT.

Found differences were:

Mean OPD queue length was very different for all configuration compared with Table 6.
Lots of the benchmark case results looked very different to Table 6.

Spent a while looking for why this might be, but not yet figured out.

13.55-14.30: Problem-solving benchmark model

To ensure it is not an error I’ve introduced, I copied the original PHC.py and modified it directly (the only change from config 1 that I have identified from the paper is that the OPD IAT is 3 instead of 4), and to produce both results spreadsheets. See GitHub commit 892cca4. This came out looking the same as my other run (e.g. doctor utilisation around 0.31 instead of 1.14).

Hence, wondered if the issue was in parameterising of model, so looked back over the paper. It is described as differing from the archetypal/generic model in two ways (not just one). I have only changed outpatient load (OPD IAT 4 to 3)

‘Note that this benchmark configuration differs from the archetypal configuration only in the outpatient load and the doctor’s consultation time for outpatients.’ , and had not spotted change in consultation time in the tables.

The consultation time for the benchmark model: “we have assumed the consultation time to be normally distributed with a mean of 5 minutes and standard deviation of 1 minute with a lower bound fixed at 2 minutes”

As in Table 4, the normal OPD consultation time is normally distribution with mean 0.87 and standard deviation of 0.21

Amended mean and sd, and set consultation boundary as a modifiable parameter (with consult_boundary_1 and 2, as the default boundary provided in the script varied depending on whether it was warm-up or not, between 0.5 and 0.3).

This pretty much fixed the issue - the only large difference for the benchmark model is now as for the other configurations: mean OPD queue length.

Reflections

Not certain if it was intentional for the original model to have two different boundaries before and after warm-up - but does bring a reflection that the duplication of code like that (before and after warm-up) does introduce more possibility of potential mistakes and that, where possible, should try to minimise that duplication.

14.36-14.53: Problem-solving mean OPD queue length

Checked example of a result in full .xlsx output for config 1 from a previous commit, but this matched the model output, which itself, was very similar to the paper output.

Hence, instead ran config 4 but enabled full result spreadsheet to be produced. That output 0.833 for Mean length of OPD queue, similar to Table 6’s 0.817 and unlike my previously calculated 6.932062.

Looking at the replication spreadsheet, I realised there were two OPD q len rows:

OPD q len (with values around 7)
- Calculated from OPD_q__list
- That is created by OPD_q__list.append(np.mean(an_list)), and an_list is delivery patients
opd q len (with values around 0.8)
- Calculated from OPD_q_length_list.
- That is created by OPD_q_length_list.append(waitingline_OPD.length.mean())

Hence, given the values match, and how it was created, the latter is evidently the appropriate row to use. Amended reproduce.ipynb to use that row, and now results match up, with no concerning deviation.

15.34-16.02 : Add SD to Table 6

Add standard deviation. Some percent change seem large as it’s comparing very small number - but actual change between the model and table 6 all are small enough that I feel happy to say this has been successfully reproduced for the mean values.

import pandas as pd

pd.read_csv('table6_10rep.csv')

	Unnamed: 0	model_outcome	model_t6_c1_old_mean	model_t6_c1_old_sd	model_t6_c2_old_mean	model_t6_c2_old_sd	model_t6_c3_old_mean	model_t6_c3_old_sd	model_t6_c4_old_mean	model_t6_c4_old_sd	t6_outcome
0	0	OPD patients	33133.600000	179.979135	14872.400000	1.386588e+02	14933.900000	53.875267	44161.600000	207.882018	NaN
1	1	IPD patients	184.000000	13.703203	190.300000	1.391282e+01	181.600000	13.309980	180.200000	11.773794	NaN
2	2	ANC patients	371.200000	12.839176	211.500000	1.314238e+01	0.000000	0.000000	373.900000	14.371654	NaN
3	3	Del patients	363.000000	25.064362	176.900000	8.252272e+00	NaN	NaN	363.100000	16.881614	NaN
4	4	OPD Q wt	0.009304	0.004588	0.180556	2.189756e-02	0.034672	0.000787	7.087377	0.171371	OPD queue waiting time (minutes)
5	5	Pharmacy Q wt	1.019412	0.012652	0.240693	4.652718e-03	0.231200	0.003927	1.277731	0.025166	Pharmacy queue waiting time (minutes)
6	6	Lab Q wt	2.087956	0.059170	0.605414	2.874245e-02	0.570976	0.014748	3.170211	0.061232	Lab queue waiting time (minutes)
7	7	doc occ	0.269412	0.003003	0.372110	2.163915e-03	0.355399	0.002388	1.145836	0.004580	Doctor utilisation
8	8	Lab patient list	189463.400000	104190.501449	84293.200000	4.652751e+04	86041.800000	47115.631487	254249.200000	139835.319294	NaN
9	9	OPD q len	0.008821	0.003825	0.167379	3.207569e-02	0.034586	0.000754	6.932062	0.304803	NaN
10	10	ipd occ	0.098000	0.004216	0.060000	1.462847e-17	0.014000	0.005164	0.095000	0.005270	NaN
11	11	opd q len	0.000822	0.000406	0.007131	8.457763e-04	0.001374	0.000035	0.832990	0.022424	Mean length of OPD queue (number of patients)
12	12	pharmacy q len	0.089752	0.001116	0.009479	2.278525e-04	0.009159	0.000191	0.149840	0.003193	Mean length of pharmacy queue (number of patie...
13	13	lab q len	0.094872	0.003335	0.012381	6.255212e-04	0.011342	0.000386	0.191373	0.004532	Mean length of Lab queue (number of patients)
14	14	NCD occ	0.869785	0.011039	0.469595	5.111055e-03	0.470006	0.006064	1.231760	0.022432	NCD Nurse utilisation
15	15	lab occ	0.557508	0.004960	0.253362	6.092231e-03	0.240857	0.003743	0.738856	0.012791	Lab utilisation
16	16	pharm occ	0.642954	0.003603	0.288769	3.379442e-03	0.289968	0.001035	0.856621	0.005442	Pharmacist utilisation
17	17	staff nurse occ	0.322609	0.008560	0.242049	4.190295e-03	0.160632	0.001316	0.321931	0.006419	Staff nurse utilisation
18	18	del occ	0.287000	0.012517	0.150000	8.164966e-03	NaN	NaN	0.278000	0.011353	Labour bed utilisation
19	19	del referred	55.600000	7.705698	16.600000	3.373096e+00	NaN	NaN	57.300000	6.074537	NaN
20	20	ipd bed occ	0.093243	0.005137	0.055268	2.650480e-03	0.011464	0.000896	0.093002	0.004686	Inpatient bed utilisation
21	21	prop_del_referred	0.152701	0.013231	0.093568	1.741344e-02	NaN	NaN	0.157772	0.015115	Fraction of childbirth cases referred

Will run 100 replications later, and confirm whether standard deviations are similar - but anticipate to take a while, so will work on something else for remainder of day, as not got time to run them all now.

As the long run-time prevents me from checking this now, I will not yet timestamp this as complete.

16.13-17.00: Starting on Figure 2A-D

Figure 2 presents results from sensitivity analysis of configuration 1.

It varies:

Number of outpatients per day:
- 170 (same as config 4) = OPD IAT 3
- 85
- 65
Average service time for outpatients - mean (SD):
- 0.87 (0.21) (same as config 1)
- 2.5 (0.5)
- 5 (1)

It states that “estimation of the outpatient arrival rate for this configuration is described in detail in Section 3.3.1”. Hence, looked back at provided times and patient counts in the Tables…

Config 1: 130 arrivals in a day and IAT of 4
Config 2/3: 60 arrivals in a day and IAT of 9
Config 4: 170 arrivals in a day and IAT of 3

In order to get those times:

print((60/4)*8.6666666666666666666666666666666)
print((60/9)*9)
print((60/3)*8.5)

130.0
60.0
170.0

Can see that, in order to get each of the times, the hours per day vary from 8.5 to 9. As such, it is unclear how I should calculate IAT for 85 and 65 per day, due to this variation. It is assumed that this is likely related to rounding?

Then looked to section 3.3.1 about this in the paper, Shoaib and Ramamohan (2021):

“The average interarrival times (and consequently the average number of patients) at each configuration were estimated in the following manner. The number of outpatients visiting configuration 1 PHCs (PHCs 1, 5, and 9) range from 80 to 150 patients per day. These include patients visiting for the first time for a given case of illness as well as patients visiting for follow-up consultations on a previous case. Thus we assumed that approximately 125 patients visit on a given day for these PHC configurations, which include 90 first-visit patients, 20% patients on their first follow-up, and 10% visiting for their second follow-up, yield approximately 126 patients. Therefore, the interarrival time of 4 minutes at configuration 1 PHCs corresponds to first-time visits, with follow-up visits scheduled at the same time on any day between the next 3 and 8 days.”

These values differ from the table - so looking again with these:

Config 1: 125 or 126 arrivals and IAT 4
Config 2/3: 55 arrivals and IAT 9

print((60/4)*8.33333333333333333)
print((60/4)*8.4)
print((60/9)*8.25)

125.00000000000001
126.0
55.0

Regarding the number of hours described in the article, this is described as “6 hours per day” for the outpatients - which differs to the 8.25 hours to 9 that align with the provided patient numbers and inter-arrival times.

Assuming this variation is simply related to rounding the IAT - as if we calculate these values based on there being 8.5 hours per day then round the IAT, we get the same results…

# IAT 3. Arrivals 170
print(60/(170/8.5))

# IAT 4. Arrivals 125-130
print(60/(125/8.5))
print(60/(130/8.5))

# IAT 9. Arrivals 55-60
print(60/(55/8.5))
print(60/(60/8.5))

3.0
4.08
3.923076923076923
9.272727272727273
8.5

As such, the inter-arrival times used were calculated as 3, 6 and 8…

# 170, 85 and 65 arrivals
print(60/(170/8.5))
print(60/(85/8.5))
print(60/(65/8.5))

3.0
6.0
7.846153846153846

Reflections from troubleshooting today

Challenges today

Adapting model to run model programmatically (rather than having to directly change the parameters in the script itself)

Difficulty identifying parameters of scenarios (when not all captured in a single table, but in combination with article text) (and when not provided in the script).

Time spent processing results into desired tables and/or figures (when not already created by the script).

Uncertainty on model parameters if not provided and calculation unclear (inter-arrival time).

Timings

import sys
sys.path.append('../')
from timings import calculate_times

# Minutes used prior to today
used_to_date = 531

# Times from today
times = [
    ('9.20', '10.03'),
    ('10.11', '10.22'),
    ('10.31', '11.43'),
    ('12.00', '12.05'),
    ('12.17', '12.57'),
    ('13.55', '14.30'),
    ('14.36', '14.53'),
    ('15.34', '16.02'),
    ('16.13', '17.00')]

calculate_times(used_to_date, times)

Time spent today: 298m, or 4h 58m
Total used to date: 829m, or 13h 49m
Time remaining: 1571m, or 26h 11m
Used 34.5% of 40 hours max

References

Shoaib, Mohd, and Varun Ramamohan. 2021. “Simulation Modelling and Analysis of Primary Health Centre Operations.” arXiv, June. https://doi.org/10.48550/arXiv.2104.12492.