library(simmer)
library(simmer.bricks)
library(magrittr)
suppressMessages(library(simEd))
suppressMessages(library(tidyr))
Sampling in R
Overview
By default R offers less control over random sampling than python and commercial simulation software. It uses a single random number stream for all sampling and does not allow you to create individual random number streams (each with its own seed) for each sampling distribution.
This is not ideal for DES, and has a range of impacts. The one you will likely experience is additional noise between experiments due to this lack of control. Another way to describe this is that changes in sampling distribution parameters and particularly arrival rates may cause experiments to go out of sync where the same patients experience different activity duration and routing due to random sampling differences across scenarios rather than the systematic differences you have introduced in your experiments.
The result of this random noise is that typically you will need to run a lot more replications to carefully assess difference between experiments than if it were reduced/eliminated. It is also harder to debug experiments.
This notebook will:
Demonstrate the shortcomings of a single random number stream and how noise is introduced between experiments.
Illustrate that problem with a simple simmer model that varies arrival rates
Introduce up to 25 random streams for sampling using the
SimEd
R package.
1. Imports
2. A simple motivating example
To illustrate the impact of this design consider the example below. Imagine we are going to simulate 5 patients length of stay (LoS) in an acute hospital followed by rehabilitation. Acute LoS is exponentially distribution while rehabilitation LoS follows a uniform distribution (the choice of distribution does not matter).
IMPORTANT: To make the as results “repeatable” as possible we will set a random seed. With a single random stream we will see that this does not guarantee repeatable samples for patients between experiments.
2.1 Constants
<- 42
SEED <- 32.0
ACUTE_MEAN <- 15.0
REHAB_MIN <- 80.0 REHAB_MAX
2.2 Experiment 1
<- 5
n_patients
set.seed(SEED)
<- rexp(n=n_patients, rate=1.0/ACUTE_MEAN)
acute_los <- runif(n=n_patients, min=REHAB_MIN, max=REHAB_MAX)
rehab_los
print(acute_los)
[1] 6.346778 21.148648 9.071713 1.222141 15.141652
print(rehab_los)
[1] 23.75333 57.70450 60.82921 44.75322 61.74230
2.3 Experiment 2
We will now reset the random stream using the same seed and limit the number of patients simulated to 2.
When we re-run the code we might expect to get
Acute Los:
6.346778 21.148648
Rehab Los:
23.75333 57.70450
But we will see that this does not happen. This is because all sampling makes use of a pseudo random number stream that generates uniformly distribution numbers \(U\)’s between 0 and 1. When only 1 stream is used for all sampling we can end up with lots of noise between experiments simply because different \(U\)’s are used for the same patients.
<- 2
n_patients
set.seed(SEED)
<- rexp(n=n_patients, rate=1.0/ACUTE_MEAN)
acute_los <- runif(n=n_patients, min=REHAB_MIN, max=REHAB_MAX)
rehab_los
print(acute_los)
[1] 6.346778 21.148648
print(rehab_los)
[1] 56.71346 48.74124
3. Force the order of sampling
To force the order of sampling within a single random number stream each patient must do their sampling upfront and in process order.
Below we re-run Experiments 1 and 2, but this time we make sure the process is sampled in order (acute then rehab) for each patient.
3.1 A Return to Experiment 1
<- 3
n_patients
set.seed(SEED)
for (patient_i in 1:n_patients) {
print(paste("Patient ", patient_i))
print(paste("Acute ", rexp(n=1, rate=1.0/ACUTE_MEAN)))
print(paste("Rehab ", runif(n=1, min=REHAB_MIN, max=REHAB_MAX)))
}
[1] "Patient 1"
[1] "Acute 6.34677797708443"
[1] "Rehab 68.9790956943762"
[1] "Patient 2"
[1] "Acute 9.07171320915222"
[1] "Rehab 48.7412366934586"
[1] "Patient 3"
[1] "Acute 15.1416521370411"
[1] "Rehab 23.7533288204577"
3.2 A Return to Experiment 2
<- 2
n_patients
set.seed(SEED)
for (patient_i in 1:n_patients) {
print(paste("Patient ", patient_i))
print(paste("Acute ", rexp(n=1, rate=1.0/ACUTE_MEAN)))
print(paste("Rehab ", runif(n=1, min=REHAB_MIN, max=REHAB_MAX)))
}
[1] "Patient 1"
[1] "Acute 6.34677797708443"
[1] "Rehab 68.9790956943762"
[1] "Patient 2"
[1] "Acute 9.07171320915222"
[1] "Rehab 48.7412366934586"
4. A simmer
model
By default simmer
will be affected by the single random stream for all sampling. This is because in a DES there is no guarantee that sampling will not occur in process order like we saw above. The numbers generated will vary depending on when events are scheduled to take place.
We will first consider this in experiments where we set the exact number of arrivals to the model. In these experiments we will not use resources. This means that there is no impact on the model due to queuing if we increase or decrease the number of arrivals to the model.
4.1 Experiment 1 (5 arrivals)
# set the seed
set.seed(SEED)
# create simmer environment
<- simmer("Experiment_1", log_level=1)
env
# setup simple patient trajectory
<- trajectory("patient_pathway") %>%
patient set_attribute("start_acute", function() {now(env)}) %>%
timeout(function() rexp(1, rate=1.0/ACUTE_MEAN)) %>%
set_attribute("acute_los", function() {now(env) - get_attribute(env, "start_acute")}) %>%
log_(function() {paste("Acute LoS ", now(env) - get_attribute(env, "start_acute"))},
level=1) %>%
set_attribute("start_rehab", function() {now(env)}) %>%
timeout(function() runif(n=1, min=REHAB_MIN, max=REHAB_MAX)) %>%
set_attribute("rehab_los", function() {now(env) - get_attribute(env, "start_rehab")}) %>%
log_(function() {paste("Rehab LoS ", now(env) - get_attribute(env, "start_rehab"))},
level=1)
%>%
env # add 5 arrivals all at the same time.
add_generator("patient", patient, at(0, 0, 0, 0, 0)) %>%
invisible
%>%
env run() %>%
invisible
1.22214: patient3: Acute LoS 1.2221407443285
6.34678: patient0: Acute LoS 6.34677797708443
9.07171: patient2: Acute LoS 9.07171320915222
15.1417: patient4: Acute LoS 15.1416521370411
21.1486: patient1: Acute LoS 21.1486480683088
24.9755: patient3: Rehab LoS 23.7533288204577
59.8949: patient4: Rehab LoS 44.7532154561486
64.0513: patient0: Rehab LoS 57.704498876119
69.9009: patient2: Rehab LoS 60.8292109624017
82.8909: patient1: Rehab LoS 61.7422963574063
4.2 Experiment 2 (3 arrivals)
Here we setup the model to simulate 3 patients that all arrive as the unit opens. Arrival times are the same, so we may expect the acute and rehab lengths of stay to remain the same. However, we can see that the acute length of stay and rehab length of stay quickly goes out of sync i.e. we have introduced noise between experiments that is nothing to do with the variation in the number of patients (that we changed between experiments). Let’s take patient 0 as an example.
In experiment 1:
Acute treatment is 6.3 days
Rehab treatment is 57.7 days
In experiment 2:
Acute treatment is 6.3 days
Rehab treatment is 48.7 days
# reset the seed
set.seed(SEED)
# create simmer environment
<- simmer("Experiment_2", log_level=1)
env
%>%
env # now limit to 3 patients.
add_generator("patient", patient, at(0, 0, 0)) %>%
invisible
%>%
env run() %>%
invisible
6.34678: patient0: Acute LoS 6.34677797708443
9.07171: patient2: Acute LoS 9.07171320915222
21.1486: patient1: Acute LoS 21.1486480683088
44.902: patient1: Rehab LoS 23.7533288204577
55.088: patient0: Rehab LoS 48.7412366934586
71.95: patient2: Rehab LoS 62.8782404516824
5. A simmer
model with random arrivals
Finally we demonstrate that the effect is still observed across two experiments that vary the parameter of exponentially distributed inter-arrival times. We will run two new experiments. In the first IAT is 10 minutes. The second experiment increases the intensity of arrivals to an IAT of 2.0 minutes.
The function get_results_for_patient
helps us trace patient 0 as they flowed through the model. The results illustratrate that the acute LoS remains the same, but the rehab LoS is different. As a single stream of random numbers was used, we were unable to control the order in which \(U\)’s were used to generate samples from the rehab L distributions.
# helper function to process results
<- function(sim_env, patient_id){
get_results_for_patient <- subset(get_mon_attributes(sim_env), select = c(name, key, value))
results <- spread(results, key, value)
results return(results[results$name == patient_id,])
}
5.1 IAT ~ Exponential(10.0)
= 10.0
mean_iat
# reset the seed
set.seed(SEED)
# create simmer environment
<- simmer("Experiment_simmer1")
env
%>%
env # expontially distr arrivals mean IAT = 10.0
add_generator("patient", patient, function() rexp(1, rate=1.0/mean_iat),
mon=2) %>%
invisible
%>%
env run(90) %>%
invisible
<- get_results_for_patient(env, "patient0") results_exp1
5.2 IAT ~ Exponential(2.0)
= 2.0
mean_iat
# reset the seed
set.seed(SEED)
# create simmer environment
<- simmer("Experiment_simmer2")
env
%>%
env # modify expontially distr arrivals to mean IAT = 9.0
add_generator("patient", patient, function() rexp(1, rate=1.0/mean_iat),
mon=2) %>%
invisible
%>%
env run(90) %>%
invisible
<- get_results_for_patient(env, "patient0") results_exp2
<- rbind(results_exp1, results_exp2)
comparison $name <- c("Patient 0 in Exp1", "Patient 0 in Exp2")
comparison comparison
name | acute_los | rehab_los | start_acute | start_rehab |
---|---|---|---|---|
Patient 0 in Exp1 | 9.071713 | 60.82921 | 1.9833681 | 11.055081 |
Patient 0 in Exp2 | 9.071713 | 76.53344 | 0.3966736 | 9.468387 |
6. Using the simEd
package
simEd
is an R package aimed at improving simulation education. It makes use of a package called rstream
that provides multiple random number streams for DES. simEd
provides up to 25 streams and re-implements a useful selection of statistical distributions that can be used.
Details of the package can be found here: https://www.rdocumentation.org/packages/simEd/versions/2.0.1
B. Lawson and L. M. Leemis, “An R package for simulation education,” 2017 Winter Simulation Conference (WSC), Las Vegas, NV, USA, 2017, pp. 4175-4186, doi: 10.1109/WSC.2017.8248124 https://ieeexplore.ieee.org/document/8248124
6.1 Revisiting our initial experiment
The code below re-implements the original experiments we conducted using simEd
equivalent sampling functions. The main difference is that we prefix functions with v
instead of r
. For example rexp
becomes vexp
. We also introduce a third parameter called stream (settable 1:25).
This time when we reduce the number of patients from 5 to 2 the acute and rehab samples remain the same for the first two patients.
<- 5
n_patients
set.seed(SEED)
# replace rexp with vexp and set stream number to 1
<- vexp(n=n_patients, rate=1.0/ACUTE_MEAN, stream=1)
acute_los
# replace runif with vunif and set stream number to 2
<- vunif(n=n_patients, min=REHAB_MIN, max=REHAB_MAX, stream=2)
rehab_los
print(acute_los)
[1] 0.4526612 13.6325644 29.0375715 38.6327714 24.0971868
print(rehab_los)
[1] 79.39406 54.01103 59.64871 55.21520 36.20446
<- 2
n_patients
set.seed(SEED)
# replace rexp with vexp and set stream number to 1
<- vexp(n=n_patients, rate=1.0/ACUTE_MEAN, stream=1)
acute_los
# replace runif with vunif and set stream number to 2
<- vunif(n=n_patients, min=REHAB_MIN, max=REHAB_MAX, stream=2)
rehab_los
print(acute_los)
[1] 0.4526612 13.6325644
print(rehab_los)
[1] 79.39406 54.01103
6.2 An updated simmer model
Finally we update the simmer
model to use simEd
and compare the results for patient 0 again. As expected we confirm that the sampling of acute and rehab duration is now in sync across the experiments.
# reset the seed
set.seed(SEED)
# create simmer environment
<- simmer("Experiment_1_simEd")
env
# redefine simple patient trajectory to use simEd variate functions
# each distribution in the model has its own stream
<- trajectory("patient_pathway") %>%
patient set_attribute("start_acute", function() {now(env)}) %>%
timeout(function() vexp(1, rate=1.0/ACUTE_MEAN, stream=1)) %>%
set_attribute("acute_los", function() {now(env) - get_attribute(env, "start_acute")}) %>%
set_attribute("start_rehab", function() {now(env)}) %>%
timeout(function() vunif(n=1, min=REHAB_MIN, max=REHAB_MAX, stream=2)) %>%
set_attribute("rehab_los", function() {now(env) - get_attribute(env, "start_rehab")})
%>%
env # exponentially distr arrivals mean IAT = 10.0
add_generator("patient", patient, function() vexp(1, rate=1.0/10.0, stream=3),
mon=2) %>%
invisible
%>%
env run(90) %>%
invisible
# store results for experiment 1 and patient 0
<- get_results_for_patient(env, "patient0") results_exp1
# reset the seed
set.seed(SEED)
# create simmer environment
<- simmer("Experiment_2_simEd")
env
%>%
env # exponentially distr arrivals mean IAT = 9.0
add_generator("patient", patient, function() vexp(1, rate=1.0/2.0, stream=3),
mon=2) %>%
invisible
%>%
env run(90) %>%
invisible
# store results for experiment 2 and patient 0
<- get_results_for_patient(env, "patient0") results_exp2
# updated comparison
<- rbind(results_exp1, results_exp2)
comparison $name <- c("Patient 0 in Exp1", "Patient 0 in Exp2")
comparison comparison
name | acute_los | rehab_los | start_acute | start_rehab |
---|---|---|---|---|
Patient 0 in Exp1 | 0.4526612 | 79.39406 | 2.2381819 | 2.6908430 |
Patient 0 in Exp2 | 0.4526612 | 79.39406 | 0.4476364 | 0.9002976 |