HDR UK Futures HDR UK Futures Testing in Research Workflows
  1. Types of test
  2. Functional tests

This site contains materials for the testing module on HDR UK’s RSE001 Research Software Engineering training course. It was developed as part of the STARS project.

  • When and why to run tests?
  • Case study
  • Introduction to writing and running tests
    • How to write a basic test
    • How to run tests
    • Parameterising tests
  • Types of test
    • Unit tests
    • Functional tests
    • Back tests
  • What was the point? Let’s break it and see!
  • Test coverage
  • Running tests via GitHub actions
  • Example repositories
  1. Types of test
  2. Functional tests

Functional tests

Choose your language:  


What is a functional test?

Functional tests are broader than unit tests. They focus on a whole feature or workflow, often involving several functions or classes working together. They check whether this end-to-end behaviour gives the correct outputs for given inputs, matching our requirements or expected results.

Example: waiting times case study

We will return to our waiting times case study which involved three functions:

  • import_patient_data() - imports raw patient data and checks that the required columns are present.
  • calculate_wait_times() - adds arrival and service datetimes, and waiting time in minutes.
  • summary_stats() - calculates mean, standard deviation and 95% confidence interval.

Unlike unit tests, which check each function in isolation, functional tests run all three steps together and verify the end-to-end workflow produces correct results.

We will need the follow imports in our test script:

import numpy as np
import pandas as pd
import pytest
from waitingtimes.patient_analysis import (
    import_patient_data, calculate_wait_times, summary_stats
)

How to write functional tests

1. Identify the feature or workflow to test

Start by choosing the complete feature or workflow you want to validate.

In our case study, we only have a simple three-step pipeline - but more complex projects may have multiple intersecting workflows you want to focus on.

2. Define inputs and expected outputs

Think about realistic scenarios that cover:

  • Success cases: standard inputs where everything should work correctly.
  • Variations: realistic variations in the inputs (e.g., different sample sizes, different distributions in the data input).
  • Edge cases: unusual but plausible inputs (e.g., unusual sample sizes, boundary values).
  • Error cases: invalid inputs that should trigger errors.

3. Write tests for these scenarios

Success case: typical data

This test confirms the workflow succeeds with standard inputs and produces correct summary statistics.

def test_workflow_success(tmp_path):
    """Complete workflow should calculate correct wait statistics."""

    # Create test data with known values
    test_data = pd.DataFrame({
        "PATIENT_ID": ["p1", "p2", "p3"],
        "ARRIVAL_DATE": ["2024-01-01", "2024-01-01", "2024-01-02"],
        "ARRIVAL_TIME": ["0800", "0930", "1015"],
        "SERVICE_DATE": ["2024-01-01", "2024-01-01", "2024-01-02"],
        "SERVICE_TIME": ["0830", "1000", "1045"],
    })

    # Write test CSV
    csv_path = tmp_path / "patients.csv"
    test_data.to_csv(csv_path, index=False)

    # Run complete workflow
    df = import_patient_data(csv_path)
    df = calculate_wait_times(df)
    stats = summary_stats(df["waittime"])

    # Verify the workflow produces correct results
    # Expected wait times: 30, 30, 30 minutes
    assert stats["mean"] == 30.0
    assert stats["std_dev"] == 0.0
    assert stats["ci_lower"] == 30.0
    assert stats["ci_upper"] == 30.0
test_that("complete workflow should calculate correct wait statistics", {
  # Complete workflow should calculate correct wait statistics.

  # Create test data with known values
  test_data <- tibble::tibble(
    PATIENT_ID   = c("p1", "p2", "p3"),
    ARRIVAL_DATE = c("2024-01-01", "2024-01-01", "2024-01-02"),
    ARRIVAL_TIME = c("0800", "0930", "1015"),
    SERVICE_DATE = c("2024-01-01", "2024-01-01", "2024-01-02"),
    SERVICE_TIME = c("0830", "1000", "1045")
  )

  # Write test CSV
  csv_path <- tempfile(fileext = ".csv")
  readr::write_csv(test_data, csv_path)

  # Run complete workflow
  df <- import_patient_data(csv_path)
  df <- calculate_wait_times(df)
  stats <- summary_stats(df$waittime)

  # Verify the workflow produces correct results
  # Expected wait times: 30, 30, 30 minutes
  expect_identical(stats$mean, 30)
  expect_identical(stats$std_dev, 0)
  expect_identical(stats$ci_lower, 30)
  expect_identical(stats$ci_upper, 30)
})

Variation: data with different distributions

This test confirms the workflow handles realistic variation in wait times.

def test_workflow_with_variation(tmp_path):
    """Workflow should correctly compute statistics for variable wait times."""

    # Create test data with known wait times: 15, 30, 45 minutes
    test_data = pd.DataFrame({
        "PATIENT_ID": ["p1", "p2", "p3"],
        "ARRIVAL_DATE": ["2024-01-01", "2024-01-01", "2024-01-01"],
        "ARRIVAL_TIME": ["0800", "0900", "1000"],
        "SERVICE_DATE": ["2024-01-01", "2024-01-01", "2024-01-01"],
        "SERVICE_TIME": ["0815", "0930", "1045"],
    })

    csv_path = tmp_path / "patients.csv"
    test_data.to_csv(csv_path, index=False)

    # Run complete workflow
    df = import_patient_data(csv_path)
    df = calculate_wait_times(df)
    stats = summary_stats(df["waittime"])

    # Verify mean and standard deviation
    assert stats["mean"] == 30
    assert np.isclose(stats["std_dev"], 15)

    # CI should be symmetric around mean for this small sample
    assert stats["ci_lower"] < stats["mean"] < stats["ci_upper"]
test_that("workflow should correctly compute statistics for variable wait times", {
  # Workflow should correctly compute statistics for variable wait times.

  # Create test data with known wait times: 15, 30, 45 minutes
  test_data <- tibble::tibble(
    PATIENT_ID   = c("p1", "p2", "p3"),
    ARRIVAL_DATE = c("2024-01-01", "2024-01-01", "2024-01-01"),
    ARRIVAL_TIME = c("0800", "0900", "1000"),
    SERVICE_DATE = c("2024-01-01", "2024-01-01", "2024-01-01"),
    SERVICE_TIME = c("0815", "0930", "1045")
  )

  csv_path <- tempfile(fileext = ".csv")
  readr::write_csv(test_data, csv_path)

  # Run complete workflow
  df <- import_patient_data(csv_path)
  df <- calculate_wait_times(df)
  stats <- summary_stats(df$waittime)

  # Verify mean and standard deviation
  expect_identical(stats$mean, 30)
  expect_equal(stats$std_dev, 15, tolerance = 1e-8)

  # CI should be symmetric around mean for this small sample
  expect_lt(stats$ci_lower, stats$mean)
  expect_gt(stats$ci_upper, stats$mean)
})

Error case: invalid input data

This test confirms the workflow fails appropriately when given invalid data.

def test_missing_date_error(tmp_path):
    """Workflow should raise error when dates are missing."""

    test_data = pd.DataFrame({
        "PATIENT_ID": ["p1", "p2", "p3"],
        "ARRIVAL_DATE": ["2024-01-01", "2024-01-01", "2024-01-01"],
        "ARRIVAL_TIME": ["0800", "0900", "1000"],
        "SERVICE_DATE": ["2024-01-01", pd.NaT, "2024-01-01"],
        "SERVICE_TIME": ["0830", "1000", "1045"],
    })

    csv_path = tmp_path / "patients.csv"
    test_data.to_csv(csv_path, index=False)

    # Workflow should fail when calculating wait times with missing dates
    df = import_patient_data(csv_path)
    with pytest.raises(ValueError, match="time data"):
        df = calculate_wait_times(df)
test_that("workflow should raise error when dates are missing", {
  # Workflow should raise error when dates are missing.

  test_data <- tibble::tibble(
    PATIENT_ID   = c("p1", "p2", "p3"),
    ARRIVAL_DATE = c("2024-01-01", "2024-01-01", "2024-01-01"),
    ARRIVAL_TIME = c("0800", "0900", "1000"),
    SERVICE_DATE = c("2024-01-01", NA, "2024-01-01"),
    SERVICE_TIME = c("0830", "1000", "1045")
  )

  csv_path <- tempfile(fileext = ".csv")
  readr::write_csv(test_data, csv_path)

  # Workflow should fail when calculating wait times with missing dates
  # Will also have warning from ymd_hm() about returning NA
  df <- import_patient_data(csv_path)
  expect_warning(
    expect_error(
        calculate_wait_times(df),
        regexp = "Failed to parse arrival or service datetimes"
    ),
    regexp = "failed to parse"
  )
})

Running our example tests

NoteTest output
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.2, pluggy-1.6.0
rootdir: /__w/hdruk_tests/hdruk_tests/examples/python_package
configfile: pyproject.toml
plugins: cov-7.0.0
collected 3 items

../examples/python_package/tests/test_functional.py ...                  [100%]

============================== 3 passed in 0.97s ===============================
<ExitCode.OK: 0>

══ Testing test_functional.R ═══════════════════════════════════════════════════

[ FAIL 0 | WARN 0 | SKIP 0 | PASS 0 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 1 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 2 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 3 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 4 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 5 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 6 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 7 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 8 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 9 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 10 ] Done!

When to stop writing tests

You cannot test everything. You’ve written enough tests when:

  • Critical workflows are covered with at least one success case.
  • Import variations and edge cases are tested.
  • Key error conditions are verified.

Focus your testing effort on workflows that matter to your research and scenarios you’re likely to encounter in practice. You’re building confidence in your code, not trying to test every theoretical possibility.

Unit tests
Back tests
 
  • Code licence: MIT. Text licence: CC-BY-SA 4.0.