Code organisation

🎯 Objectives

This page introduces functions and classes with practical examples first, before stepping back to explain how these fit into broader programming paradigms.

  • ♻️ Reusing code: Rationale for using functions or classes.
  • 🧩 Functions: Group reusable logic into callable blocks.
  • 🧱 Classes: Bundle data and behaviour for more complex tasks.
  • 🧠 Programming paradigms: Understand how functions and classes reflect different styles of organising code.
  • 🧪 Activities: Have a go at refactoring your code!
  • 📎 Further information.

🔗 Reproducibility guidelines

This page helps you meet reproducibility criteria from:

  • Heather et al. 2025: Minimise code duplication.
  • NHS Levels of RAP (🥈): Reusable functions and/or classes are used where appropriate.

🔍 Note: This pages includes some simplified, standalone examples to illustrate how functions and classes work. They are not taken from the full example simulation models in this book.

♻️ Reusing code

It might seem convenient to write all your code in a single script and simply copy-and-paste sections when you need to reuse code. However, this approach quickly becomes unmanageable as your project grows.

The main tools you can use to organise your code are functions and classes. These allow you to break your code into modular components, each focused on a specific task. Modular code is easier to manage, test, and reuse - and these functions or classes can then be stored in seperate files.

Using functions and classes has has several benefits:

  • Easier to read and collaborate on. Code is clearer and easier to understand for you and others.
  • Simpler to maintain with fewer errors. Complex processes are broken into manageable parts. This makes them easier to update, test, and debug.
  • No duplication. Writing reusable code (like functions and classes) means changes only need to be made in one place, such as when you are updating or fixing a bug. In contrast, duplicated code requires updates everywhere it appears, and you may miss some.

Functions

Functions group code into reusable blocks that perform a specific task. You define inputs (parameters), a sequence of steps (the function body), and outputs (return values).

Functions are ideal when you want to reuse actions to perform an operation or calculation.

Here, estimation of wait time based on the queue length and average service time:

def estimate_wait_time(queue_length, avg_service_time):
    return queue_length * avg_service_time

# There are 4 patients ahead, average service time is 15 minutes
print(estimate_wait_time(queue_length=4, avg_service_time=15))
60
estimate_wait_time <- function(queue_length, avg_service_time) {
  return(queue_length * avg_service_time)
}

# There are 4 patients ahead, average service time is 15 minutes
print(estimate_wait_time(queue_length = 4, avg_service_time = 15))
[1] 60

Classes

Classes bundle together data (“attributes”) and behaviour (“methods”). They become useful when you:

  • Need to keep track of state. This means you want to remember information about an object over time. For example, if you have a Patient class, each patient object can keep track of its patient ID, patient type, etc. You can update or check these attributes whenever you want.
  • Have several related functions that operate on the same kind of data. Instead of writing separate functions that all work on the same data, you can put these inside a class as methods. This keeps your code organised and easier to use.

You first initialise the class with a special method called __init__. This methods runs when you create an object from the class. You pass parameters to it to set-up the initial attributes for that object.

You then have methods, which are like mini functions inside the class. They can access and change the class attributes, and can also accept additional inputs.

You create an instance of the class (called an “object”) to use it. Each attribute has its own copy of the attributes and can use the methods to work with its own data.

Example:

class Patient:
    def __init__(self, patient_id, arrival_time):
        self.patient_id = patient_id
        self.arrival_time = arrival_time
        self.status = "waiting"
        
    def admit(self):
        self.status = "admitted"
        
    def discharge(self):
        self.status = "discharged"


alice = Patient(patient_id=1, arrival_time=3)
print(alice.status)
waiting
alice.admit()
print(alice.status)
admitted

Classes are less common in R than functions, but they are used sometimes for complex or structure tasks. R supports multiple classes systems:

  • S3 - the most informal (few rules and very little enforced structure), used widely in base R.
  • S4 - stricter (has rules and have to formally declare class and properties) and used in some packages like Bioconductor.
  • R6 - a newer system that allows reference-style classes (edits modify the object “in place”, unlike most R objects which use a “copy-on-modify” approach where edits create new copies) and works more like Python classes.

The example below uses R6, since this style is closest to how Python handles classes, making it easier to draw parallels between the two languages.

library(R6)

Patient <- R6Class("Patient",
  public = list(
    patient_id = NULL,
    arrival_time = NULL,
    status = NULL,
    initialize = function(patient_id, arrival_time) {
      self$patient_id <- patient_id
      self$arrival_time <- arrival_time
      self$status <- "waiting"
    },
    admit = function() {
      self$status <- "admitted"
    },
    discharge = function() {
      self$status <- "discharged"
    }
  )
)


alice <- Patient$new(patient_id=1, arrival_time=3)
print(alice$status)
[1] "waiting"
alice$admit()
print(alice$status)
[1] "admitted"

Subclasses and inheritance

A subclass (or “child class”) is a class that inherits from another class (the “parent” or “superclass”). Subclasses can reuse or extend the behaviour of their parent.

Here, an emergency patient with an extra attribute (severity) and method (triage()).

class EmergencyPatient(Patient):
    def __init__(self, patient_id, arrival_time, severity):
        super().__init__(patient_id, arrival_time)
        self.severity = severity

    def triage(self):
        if self.severity > 7:
            return "High priority"
        else:
            return "Standard priority"


ben = EmergencyPatient(patient_id=2, arrival_time=5, severity=9)
print(ben.status)
waiting
print(ben.triage())
High priority
EmergencyPatient <- R6Class("EmergencyPatient",
  inherit = Patient,
  public = list(
    severity = NULL,
    initialize = function(patient_id, arrival_time, severity) {
      super$initialize(patient_id, arrival_time)
      self$severity <- severity
    },
    triage = function() {
      if (self$severity > 7) {
        return("High priority")
      } else {
        return("Standard priority")
      }
    }
  )
)


ben <- EmergencyPatient$new(patient_id=2, arrival_time=5, severity=9)
print(ben$status)
[1] "waiting"
print(ben$triage())
[1] "High priority"

Programming paradigms

Now that you’ve seen how functions and classes help structure your code, we can step back and look at what broader programming paradigms these patterns reflect.

A programming paradigm is a general style or approach to organising and structuring code. The most common paradigms in Python and R are:

  • 📜 Using functions - procedural or functional programming.
  • 🧱 Using classes - object-oriented programming (OOP).

This table provides a brief overview of the programming paradigms. This is just a quick overview - if you want to find out more, check out the resources linked in the further reading.

Paradigm Main object used Key characteristics
Procedural programming Functions • Code runs step-by-step using functions, often passing data between them
• Data structures (like lists) are usually mutable (i.e. they can be changed directly).
• Use loops like for and while to repeat actions.
Functional programming Functions • Uses pure functions, which always return the same output for same input, and do not alter anything outside themselves (only effect is to return value/s).
• Functions are values - they can be saved in variables, passed to other functions, or returned just like other data types.
• Data structures are immutable, meaning they can’t be changed directly - instead, a new version is created when a change is needed.
• Often replaces loops with recursion (a function calling itself) and higher-order functions (like Python’s map or R’s sapply).
•These features tend to make code more predictable, easier to debug, and more maintainable than procedural approaches.
Object-oriented programming (OOP) Classes/Objects • Organises code using objects - instances of classes that combine data (attributes) and behaviour (methods).
• Supports encapsulation - combining related data and functions, and hiding internal details when needed.
• Enables inheritance, where one class can build on another.
• Supports polymorphism, allowing different objects to respond differently to the same operation depending on their type.

Normally, a mix of programming paradigms will be used.

In this book, you’ll see both functions and classes used, depending on what made sense for the task. Generally, we’ve used more classes in Python and more functions in R - as is typical, since R code often relies less on classes - but neither approach is exclusive, and often either could have been used. The choice is mostly a matter of language style, clarity, and what fits best for the problem at hand.

🧪 Activities

Modular scripts are easier to maintain, reuse, and extend — so this is the perfect point to start tracking them with version control.

Try putting some of you existing scripts into a GitHub repository, and committing updates as you refine your functions and classes.

Not sure how? Revisit our page on version control and GitHub for a quick walkthrough.

Don’t have any scripts? Here is some example code which you could try turning into functions and/or classes. It simulates patient arrivals and service times (without using functions or classes).

You could turn this into:

  • A function e.g. simulate_queue(arrivals, service_times) that returns the results table.
  • A class e.g. a Patient class and/or a Clinic class that runs the simulation
import numpy as np
import pandas as pd

# Set seed for reproducibility
np.random.seed(123)

# Input data
patient_arrivals = [1, 3, 4, 10, 12]  # in minutes
service_times = [5, 7, 3, 4, 6]       # in minutes
arrival_ids = list(range(1, len(patient_arrivals) + 1))

# Initialize tracking variables
start_times = [0] * len(patient_arrivals)
end_times = [0] * len(patient_arrivals)
waiting_times = [0] * len(patient_arrivals)

# Simulate service
for i in range(len(patient_arrivals)):
    if i == 0:
        start_times[i] = patient_arrivals[i]
    else:
        # Next patient starts when they arrive or when previous is done
        start_times[i] = max(patient_arrivals[i], end_times[i - 1])
    
    end_times[i] = start_times[i] + service_times[i]
    waiting_times[i] = start_times[i] - patient_arrivals[i]

# Combine into a DataFrame
results = pd.DataFrame({
    'id': arrival_ids,
    'arrival': patient_arrivals,
    'start': start_times,
    'end': end_times,
    'waiting': waiting_times
})

print(results)
   id  arrival  start  end  waiting
0   1        1      1    6        0
1   2        3      6   13        3
2   3        4     13   16        9
3   4       10     16   20        6
4   5       12     20   26        8
set.seed(123)

# Input parameters
patient_arrivals <- c(1, 3, 4, 10, 12)  # in minutes
service_times <- c(5, 7, 3, 4, 6)       # in minutes
arrival_ids <- seq_along(patient_arrivals)

# Simulation tracking variables
start_times <- numeric(length(patient_arrivals))
end_times <- numeric(length(patient_arrivals))
waiting_times <- numeric(length(patient_arrivals))

# Simulate each patient's start and end time
for (i in 1:length(patient_arrivals)) {
  if (i == 1) {
    start_times[i] <- patient_arrivals[i]
  } else {
    # Next patient starts after previous is done — or when they arrive, whichever is later
    start_times[i] <- max(patient_arrivals[i], end_times[i - 1])
  }
  end_times[i] <- start_times[i] + service_times[i]
  waiting_times[i] <- start_times[i] - patient_arrivals[i]
}

# Combine into a data frame
results <- data.frame(
  id = arrival_ids,
  arrival = patient_arrivals,
  start = start_times,
  end = end_times,
  waiting = waiting_times
)

print(results)
  id arrival start end waiting
1  1       1     1   6       0
2  2       3     6  13       3
3  3       4    13  16       9
4  4      10    16  20       6
5  5      12    20  26       8

📎 Further information