Day 11 – Reproducing Hernandez et al. 2015

Note

Ran remaining experiments and created figures and tables, although unfortunately not reproduced and no further troubleshooting ideas. Total time used: 17h 41m (44.2%)

09.20-09.35: Running and processing Experiment 3

Was going to alter Experiment 3 to just do the two shorter scenarios, but then realised those had finished on Friday, and the process had only remained running for the longer scenario. However, the timings were missing, as that would have been done once all completed. Hence, still amended the script to just run two, and then set these to run, as I need to know the times.

source ~/miniconda3/bin/activate
conda activate hernandez2015
python -m Experiment3

I checked the results, and - unsurprisingly, given previous - these are similar but differ from the article.

Once this finished, it had a run time of: 3 hours 13 minutes

09.36-09.41, 09.50-10.07: Experiment 4

Experiment 4 is the tri-objective with a maximum of 1, 2 or 3 line managers (known as greeters in the code) ran with population 50 and 25 generations. We set the upper bounds for the number of line managers in StaffAllocationProblem.py:

self.upperBounds = [60, 60, 60, 60]

Hence, I’m assuming I’ll need to alter this to:

self.upperBounds = [1, 60, 60, 60]
self.upperBounds = [2, 60, 60, 60]
self.upperBounds = [3, 60, 60, 60]

To do so programmatically, I’ll need to make this an input to the class StaffAllocationProblem(), and then likewise in ExperimentRunner() and main.py where it is called, alike how I did for objectiveTypes.

10.09-10.14, 10.18-10.28: Experiment 5

Experiment 5 has 6 dispensing, 6 sceening, 4 line manager and one medical. It then varies the number of replications from 1-7.

To set staff numbers, I’m assuming I’ll need to set upperBounds and lowerBounds to the same values, and so modified the code accordingly to allow input of lowerBounds.

To set number of replications, I’m assuming this is referring to runs, as that is an input that was set up, and I have previously assumed that when it says to set to run three times, it is referring to that parameter.

It doesn’t state population and generations, but I’m assuming generations is 1 (as it’s a fixed number of staff), and that population is 1000.

10.51-10.57: Processing Experiment 4

Run time: 37 minutes

As observed for similar figures previously, although patterns are similar, the axis values differ sufficiently that this is not reproduced (e.g. 10-120 staff members instead of 10-70, and 2000-6000 throughput rather than 500-5000 throughput).

11.09-11.55, 11.57-12.10, 12.14-12.24: Troubleshooting Experiment 5

Run time: 41 minutes

However, the result from each was identical. I’m wondering if I changed the right thing? Looking at PODSimulation.py they have option of amending capacities:

########################
name = 'greeter'
n = 0
self.resources[name] = simpy.Resource(capacity=capacities[n], 
                                        name=name,
                                        monitored=True)
self.monitors[name] = simpy.Monitor(name=name, ylab=ylab)

########################
name = 'screener'
n = 1
self.resources[name] = simpy.Resource(capacity=capacities[n], 
                                        name=name,
                                        monitored=True)
self.monitors[name] = simpy.Monitor(name=name, ylab=ylab)

########################        
name = 'dispenser'
n = 2
self.resources[name] = simpy.Resource(capacity=capacities[n], 
                                        name=name,
                                        monitored=True)        
self.monitors[name] = simpy.Monitor(name=name, ylab=ylab)

########################
name = 'medic'
n = 3
self.resources[name] = simpy.Resource(capacity=capacities[n], 
                                        name=name,
                                        monitored=True)
self.monitors[name] = simpy.Monitor(name=name, ylab=ylab)

This seems to imply there are 0 greeters, 1 screener, 2 dispensers and 3 medics.

I’m wondering if perhaps I shouldn’t be running this like I have done so far, which is by searching through candidate solutions, given that this should only be the result of running the discrete event simulation? I completely changed Experiment5.py to run PODSimulation() directly. I borrowed code from StaffAllocationProblem.py.

Running this manually, I found I could get the same result (459.333333 throughoutput, 64.820665 time).

Then, in PODSimulation.py, I stumbled across the code that looks like it was designed to run this experiment:

if __name__ == '__main__':
    #greeter, screener, dispenser, medic
    
    startTime = datetime.datetime.now()
    print "program started:", startTime
    #capacities = [1,1,1,1]
    capacities = [4, 6, 6, 1]
    
    seeds = get_20_seeds()
    #seeds = [123]
    simulations = []
    for seed in seeds:
        simul = PODSimulation(capacities)
        simul.model(seed)
        simulations.append(simul)
    resultsAnalyzer = ResultsAnalyzer.ResultsAnalyzer(simulations)
    resultsAnalyzer.show_results()
    endTime = datetime.datetime.now()
    print "program finished:", endTime 
    print "simulation length: ", endTime - startTime

I copied this into Experiment5.py, adapting it so that it saved the individual results to a file (rather than printing average results), and so it saved the time to a file too. To get individual results took a bit of work to figure out.

From StaffAllocationProblem.py, we know throughput = simulatorRunner.get_processed_count().

In SimulatorRunner.py, we see that:

def get_processed_count(self):
        return self.avgProcessedCount

And -

self.avgProcessedCount = self.resultsAnalyzer.get_avg_total_number_out()

From ResultsAnalyzer.py, we can see that:

def get_avg_total_number_out(self):
        return self.avgTotalNumberOut

And -

self.avgTotalNumberOut += simul.get_number_out() / float(n)

Where self.n = len(simulations).

Hence, to get the throughput per simulation, we just need simul.get_number_out(). I saved this to .txt.

Run time: 8 seconds

I checked plotting_staff_results.r but it didn’t seem to have any code for this figure, so I wrote some to produce the figure.

And, alike I have found for other figures, I see a similar pattern in the results, although different values on the axises.

13.21-13.51, 13.58-14.27: Appendix A.1 (Table 3)

Appendix A.1 shows mean and confidence intervals for several different metrics. It is run with:

10% pre-screened
4 line manager
6 dispensing
6 screening
1 medical

Each with twenty replications. Hence, it appears this just directly uses the segment of code I had previously identified in PODSimulation.py and adapted for Experiment 5. These return average and half-width (which, as from this source, understand that the distance from mean to edge of confidence interval can be called the precision, margin of error or half-width).

It appears to have all the metrics needed, so all I needed to do was convert it into a table. From def __str__ I could see what components were used to make the printed output, and so what I needed for the table.

Add pandas to environment so could output a dataframe, although had to find one that was compatible with numpy 1.8.0, which meant using pandas from pip as condas doesn’t go back that far. It took a long time to install the pip dependencies, and then had module errors of ImportError: No module named dateutil.tz, though unresolved with install of python-dateutil, and so I decided to just output to csv and process in R.

However, I then started getting an error when importing myutils: ImportError: matplotlib requires dateutil. I think this probably resulted from my adding python-dateutil and then just pruning environment, so I deleted it and rebuilt it.

Run time: 10 seconds

Again however, this differed from the original.

import pandas as pd

pd.read_csv('table3.csv')

	Estimate	Avg	LowerBound	UpperBound
0	Wait time	66.66	63.75	69.57
1	No. in (Designees)	11979.15	11935.90	12022.40
2	No. out (Designees)	473.20	458.87	487.53
3	Dispensing wait time	18.84	18.33	19.35
4	Line mngr. wait time	23.73	23.60	23.86
5	Med. eval. wait time	9.06	6.05	12.06
6	Screening wait time	15.04	14.79	15.29
7	Dispensing no. waiting	434.81	423.35	446.27
8	Line mngr. no. waiting	4733.55	4705.73	4761.38
9	Med. eval. no. waiting	2.19	1.42	2.96
10	Screening no. waiting	569.06	560.52	577.61

14.29-14.38: Appendix A.2 (Table 4)

Table 4 is described as being results with line manager and 10% pre-screened. Hence, I’m assuming it’s just the pre-screen 10 results from Experiment 1?

These have 255 rows though, so I’m then assuming it’s just the first 40 rows.

It doesn’t appear reproduced, with quite different results, which was as I expected, given prior results.

res = pd.read_csv('exp1_prescreen10.txt', sep='\t')

print(res.shape)

res.head(40)

(255, 7)

	greeter	screener	dispenser	medic	resources	throughput	time
0	4	1	14	1	20	435.000000	52.355130
1	4	1	14	3	22	435.666667	51.441467
2	7	1	14	1	23	650.000000	46.791091
3	8	1	14	1	24	669.666667	48.432571
4	7	1	14	3	25	647.000000	46.533784
5	4	6	14	1	25	1046.000000	53.590160
6	9	1	14	2	26	752.333333	43.787625
7	4	6	14	2	26	1048.333333	46.674927
8	5	6	14	1	26	1058.333333	61.717741
9	9	1	14	3	27	753.666667	43.688801
10	4	6	14	3	27	1096.666667	45.042496
11	4	6	14	4	28	1073.666667	44.577338
12	8	6	14	1	29	1103.000000	60.900434
13	4	6	14	6	30	1073.333333	44.503737
14	16	1	14	1	32	1079.666667	35.059892
15	4	10	14	4	32	1103.333333	43.141365
16	8	6	14	4	32	1111.666667	48.838884
17	8	6	14	5	33	1107.666667	48.787158
18	4	10	14	6	34	1094.000000	42.299897
19	4	6	25	1	36	1347.666667	46.158593
20	4	6	25	2	37	1351.666667	41.018076
21	5	6	25	1	37	1436.666667	50.700946
22	4	6	25	3	38	1371.666667	38.838681
23	5	6	25	2	38	1449.000000	42.375245
24	4	6	25	4	39	1371.000000	38.189643
25	5	6	25	3	39	1432.666667	42.123040
26	7	6	25	1	39	1546.666667	45.183401
27	5	6	25	4	40	1433.000000	39.702079
28	7	6	25	2	40	1545.333333	42.707594
29	4	10	25	1	40	1876.000000	43.735103
30	4	6	25	6	41	1373.666667	38.180682
31	4	10	25	2	41	1877.666667	38.542261
32	4	10	27	1	42	2055.000000	43.479165
33	4	10	25	4	43	1862.000000	31.677448
34	4	10	27	2	43	2045.666667	37.892850
35	4	10	25	5	44	1909.000000	31.112816
36	4	10	27	3	44	2006.333333	35.759831
37	18	1	25	1	45	1322.333333	30.449133
38	4	10	25	6	45	1915.333333	31.019967
39	4	10	27	4	45	2045.333333	32.732751

14.49-14.57: Final look over

I looked over the code again, trying to spot anything I could alter to help resolve discrepancy, but had no further ideas. I doubled checked the capacities in PODSimulation.py but was satisified these were coming from the provided inputs.

With no further ideas, I will stop at this point, and get (a) consensus from another team member on reproduction success, and (b) message the author to inform them and ask for suggestions if wish (although being quite aware that this was a very long time ago for them, so shouldn’t imagine that would be appropriate in this case).

Time at this point: 1061 minutes.

Timings

import sys
sys.path.append('../')
from timings import calculate_times

# Minutes used prior to today
used_to_date = 858

# Times from today
times = [
    ('09.20', '09.35'),
    ('09.36', '09.41'),
    ('09.50', '10.07'),
    ('10.09', '10.14'),
    ('10.18', '10.28'),
    ('10.51', '10.57'),
    ('11.09', '11.55'),
    ('11.57', '12.10'),
    ('12.14', '12.24'),
    ('13.21', '13.51'),
    ('13.58', '14.27'),
    ('14.29', '14.38'),
    ('14.49', '14.57')]

calculate_times(used_to_date, times)

Time spent today: 203m, or 3h 23m
Total used to date: 1061m, or 17h 41m
Time remaining: 1339m, or 22h 19m
Used 44.2% of 40 hours max