❌ Randomness

🎯 Objectives

This page covers:

  • 🎲 Randomness: sampling and pseudorandom number generators.
  • 🛠️ Tools for random sampling:
  • ⛓️‍💥 Independent random number streams:
  • 👨‍💻 Adding random sampling to your simulation:

🔗 Reproducibility guidelines

This page helps you meet reproducibility criteria from:

  • Heather et al. 2025: Control randomness.
# Import required packages
import numpy as np
from sim_tools.distributions import Exponential, Normal

🎲 Randomness

When simulating real-world systems (e.g. patients arriving, prescriptions queueing), using fixed values is unrealistic. Instead, discrete-event simulations rely on randomness: they generate event times and outcomes by sampling randomly from probability distributions.

However, computers can’t generate “true” randomness. Instead, they use pseudorandom number generators (PRNGs): algorithms which produce a sequence of numbers that apopear random, but are in fact completely determined by an initial starting point called a seed. If you use the same seed, you’ll get the same random sequence every time - a crucial feature for reproducibility in research and development.

By default, most software PRNGs initialise the seed based on something that changes rapidly and unpredictably, such as the current system time. This ensures a unique sequence with every fresh run. But for reproducible experiments, you should manually set the seed yourself, allowing you and others to exactly recreate simulation results.

🛠️ Introducing tools for random sampling

NumPy

NumPy is a widely used Python package for generating random numbers. Since NumPy version 1.17, the recommended approach for random sampling is through the new Generator API.

To start, create a generator object using np.random.default_rng(). This generator is an independent random number source and manages its own random state. It offers methods for generating random samples from many distributions, such as normal, exponential, and uniform.

For example, to generate 3 samples from an exponential distribution (with a mean of 17):

rng = np.random.default_rng()
rng.exponential(scale=17, size=3)
array([12.36121824,  6.96683301,  3.08983849])

To ensure the same results everytime the code runs, initialise with a fixed seed:

rng = np.random.default_rng(20)
rng.exponential(scale=17, size=3)
array([3.64795914, 9.04242889, 2.27536902])
rng = np.random.default_rng(20)
rng.exponential(scale=17, size=3)
array([3.64795914, 9.04242889, 2.27536902])

sim-tools

The sim-tools package (developed by Prof. Tom Monks) provides an object-oriented approach to sampling from a variety of statistical distributions.

sim-tools builds on NumPy’s random number generator and distributions. This means you get the same reliable, well-tested methods, with a convenient object-oriented interface. The package also includes some additional distribution types and tools for simulation.

To use sim-tools, you import the class for the distribution you need, specify parameters, and then use the instance to generate samples. Each object has its own seed and random state.

For example:

# Create an Exponential distribution object with mean 6 and seed 3
exp = Exponential(mean=6, random_seed=3)

# Generate 3 random samples
samples = exp.sample(size=3)
print(samples)
[0.66008888 2.33794124 8.39724575]

In R, random sampling is typically done with functions from the stats package. To ensure reproducibility, set a global random seed with set.seed(). Then use the appropriate function for your desired distribution.

For example, the exponential distribution is sampled with rexp() which requires the rate parameter (the inverse of the mean). To generate 3 samples from an exponential distribution with a mean of 6, you would use:

set.seed(3)
rexp(n=3L, rate=1L/6L)
[1] 10.383759  3.690197  7.397500

⛓️‍💥 Independent random number streams

In your simulation, you often have multiple distinct processes that involve random sampling (e.g. patients arriving, consultation lengths, routing decisions).

If these processes share the same random number generator, changes in one part of your model (like altering the number of samples drawn for arrivals) can unintentionally affect the random numbers used elsewhere. This can introduce subtle, hard-to-detect errors, making your simulation harder to debug and compare across scenarios.

To demonstrate this problem, we will simulate two queues: arrivals and processing times.

Problem: Shared generators

rng = np.random.default_rng(1)
arrivals = rng.exponential(scale=3, size=3)
processing = rng.normal(loc=10, scale=2, size=3)

print(f"Arrivals: {arrivals}\nProcessing: {processing}")
Arrivals: [ 3.21908708  0.92535943 16.12631062]
Processing: [11.81071173 10.89274914  8.92609353]

Change to five arrivals samples:

rng = np.random.default_rng(1)
arrivals = rng.exponential(scale=3, size=5)
processing = rng.normal(loc=10, scale=2, size=3)

print(f"Arrivals: {arrivals}\nProcessing: {processing}")
Arrivals: [ 3.21908708  0.92535943 16.12631062  1.09928134  0.34608612]
Processing: [ 8.92609353 11.16223621 10.72914479]

⚠️ Processing samples have changed, despite no changes to their sampling code! This is because they share a generator: drawing more arrival samples moves the random number sequence forward, so the processing draws start from a different place.

Solution: Independent generators

rng_a = np.random.default_rng(1)
rng_p = np.random.default_rng(2)
arrivals = rng_a.exponential(scale=3, size=3)
processing = rng_p.normal(loc=10, scale=2, size=3)

print(f"Arrivals: {arrivals}\nProcessing: {processing}")
Arrivals: [ 3.21908708  0.92535943 16.12631062]
Processing: [10.37810676  8.95450312  9.17387291]
rng_a = np.random.default_rng(1)
rng_p = np.random.default_rng(2)
arrivals = rng_a.exponential(scale=3, size=5)
processing = rng_p.normal(loc=10, scale=2, size=3)

print(f"Arrivals: {arrivals}\nProcessing: {processing}")
Arrivals: [ 3.21908708  0.92535943 16.12631062  1.09928134  0.34608612]
Processing: [10.37810676  8.95450312  9.17387291]

✅ Processing samples remain the same.

sim-tools

The same rule applies for sim-tools: use one object per process, each with it’s own seed.

arrivals_exp = Exponential(mean=3, random_seed=1)
processing_norm = Normal(mean=10, sigma=2, random_seed=2)

arrivals = arrivals_exp.sample(3)
processing = processing_norm.sample(3)

print(f"Arrivals: {arrivals}\nProcessing: {processing}")
Arrivals: [ 3.21908708  0.92535943 16.12631062]
Processing: [10.37810676  8.95450312  9.17387291]
arrivals_exp = Exponential(mean=3, random_seed=1)
processing_norm = Normal(mean=10, sigma=2, random_seed=2)

arrivals = arrivals_exp.sample(5)
processing = processing_norm.sample(3)

print(f"Arrivals: {arrivals}\nProcessing: {processing}")
Arrivals: [ 3.21908708  0.92535943 16.12631062  1.09928134  0.34608612]
Processing: [10.37810676  8.95450312  9.17387291]

Which package to use?

We’ve used sim-tools for sampling in the example models in this book. This is because, with sim-tools, you’re encouraged to use independent random number streams by design, as each distribution gets its own instance of a distribution class, each with its own parameters and random seed.

You can still use NumPy directly if you prefer - you’ll just need to remember to deliberately create separate random generators, as it’s not “enforced”. If you don’t explicitly create one per stream, it’s easy to accidentally fall back on using a single generator.

Problem

set.seed(1)
arrivals <- rexp(n=3L, rate=1L/3L)
processing <- rnorm(n=3L, mean=10L, sd=2L)

print(arrivals)
[1] 2.2655455 3.5449283 0.4371202
print(processing)
[1] 12.65960 12.54486 10.82928

Change to five arrivals samples:

set.seed(1)
arrivals <- rexp(n=5L, rate=1L/3L)
processing <- rnorm(n=3L, mean=10L, sd=2L)

print(arrivals)
[1] 2.2655455 3.5449283 0.4371202 0.4193858 1.3082059
print(processing)
[1] 6.920100 8.142866 9.410559

⚠️ Processing samples have changed, despite no changes to their sampling code! This is because they share a generator: drawing more arrival samples moves the random number sequence forward, so the processing draws start from a different place.

Solution

This section is adapted from https://pythonhealthdatascience.github.io/stars-treat-simmer/02_model/03_r_sampling.html.

By default, R’s stats package uses a single random number stream and does not support managing multiple, independent streams.

The simEd package, last updated in 2023, provides equivalent functions that allow separate random number streams. The package is described in:

B. Lawson and L. M. Leemis, “An R package for simulation education,” 2017 Winter Simulation Conference (WSC), Las Vegas, NV, USA, 2017, pp. 4175-4186, doi: 10.1109/WSC.2017.8248124 https://ieeexplore.ieee.org/document/8248124.

We did not use simEd in this work because it slowed down model execution.

However, as an example of how it works:

ALTERNATIVELY just link to https://pythonhealthdatascience.github.io/stars-treat-simmer/02_model/03_r_sampling.html#using-the-simed-package if don’t want to / have issues addin g it to the dependencies