Testing in Research Workflows
  1. Types of test
  2. Regression tests

This site contains materials for the testing module on HDR UK’s RSE001 Research Software Engineering training course. It was developed as part of the STARS project.

  • When and why to run tests?
  • Case study
  • Introduction to writing and running tests
    • How to write a basic test
    • How to run tests
    • Parameterising tests
    • Data, temporary files and mocking
  • Types of test
    • Smoke tests
    • Unit tests
    • System tests
    • Regression tests
  • What was the point? Let’s break it and see!
  • Test coverage
  • Running tests via GitHub actions
  • Defensive programming
  • Example repositories
  1. Types of test
  2. Regression tests

Regression tests

Choose your language:  


What is a regression test?

A regression test involves running your workflow on historical data and confirming that results are consistent over time.

It’s not focused on whether results are theoretically correct. It’s about consistency and reproducibility - confirming that code changes, environment updates, or data pipeline tweaks have not silently changed results.

Example: waiting times case study

We will run regression tests using the dataset we introduced for our waiting times case study.

In our test script, we need to import:

from pathlib import Path
import numpy as np
from waitingtimes.patient_analysis import (
    import_patient_data, calculate_wait_times, summary_stats
)

Regression test

def test_reproduction():
    """Re-running on historical data should produce consistent results."""
    # Specify path to historical data
    csv_path = Path(__file__).parent.joinpath("data/patient_data.csv")

    # Run functions
    df = import_patient_data(csv_path)
    df = calculate_wait_times(df)
    stats = summary_stats(df["waittime"])

    # Verify the workflow produces consistent results
    assert np.isclose(stats["mean"], 4.1666, rtol=0.0001)
    assert np.isclose(stats["std_dev"], 2.7869, rtol=0.0001)
    assert np.isclose(stats["ci_lower"], 1.2420, rtol=0.0001)
    assert np.isclose(stats["ci_upper"], 7.0913, rtol=0.0001)
test_that("re-running on historical data produces consistent results", {

  # Specify path to historical data
  csv_path <- testthat::test_path("data", "patient_data.csv")

  # Run functions
  df <- import_patient_data(csv_path)
  df <- calculate_wait_times(df)
  stats <- summary_stats(df$waittime)

  # Verify the workflow produces consistent results
  expect_equal(stats$mean,     4.1666, tolerance = 1e-4)
  expect_equal(stats$std_dev,  2.7869, tolerance = 1e-4)
  expect_equal(stats$ci_lower, 1.2420, tolerance = 1e-4)
  expect_equal(stats$ci_upper, 7.0913, tolerance = 1e-4)
})

Running our example test

NoteTest output
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.2, pluggy-1.6.0
rootdir: /__w/stars-testing-intro/stars-testing-intro/examples/python_package
configfile: pyproject.toml
plugins: cov-7.0.0
collected 1 item

../examples/python_package/tests/test_regression.py .                    [100%]

============================== 1 passed in 0.97s ===============================
<ExitCode.OK: 0>

══ Testing test_regression.R ═══════════════════════════════════════════════════

[ FAIL 0 | WARN 0 | SKIP 0 | PASS 0 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 1 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 2 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 3 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 4 ] Done!

When should you update your regression tests?

Errors: If you identify an error in your pipeline, you first fix the code and then deliberately update the regression test in isolation, so you know the only change in behaviour is the error fix and not something unintended elsewhere.

Changes over time: As your research evolves, you may update the workflow (e.g., improve the wait time calculation method) or use more recent datasets. You can keep the old regression test running alongside new ones - this verifies that changes to the workflow don’t accidentally alter results on historical data, while new regression tests validate that updated methods work correctly on current data.

System tests
What was the point? Let’s break it and see!
 
  • Code licence: MIT. Text licence: CC-BY-SA 4.0.