import numpy as np
import pandas as pd
import pytest
from waitingtimes.patient_analysis import (
import_patient_data, calculate_wait_times, summary_stats
)System tests
What is a system test?
System tests are broader than unit tests. They focus on a whole feature or workflow, often involving several functions or classes working together. They check whether this end-to-end behaviour gives the correct outputs for given inputs, matching our requirements or expected results.
Another useful type of test that sits between unit and system tests is the integration test.
Integration tests focus on how two or more components work together (e.g., how a data import function hands data to a processing function), without necessarily running the whole workflow end-to-end.
In this small case study, adding separate integration tests would not add much beyond our unit and system tests, but in larger projects they are very helpful for checking the interactions between parts of your code.
Example: waiting times case study
We will return to our waiting times case study which involved three functions:
import_patient_data()- imports raw patient data and checks that the required columns are present.calculate_wait_times()- adds arrival and service datetimes, and waiting time in minutes.summary_stats()- calculates mean, standard deviation and 95% confidence interval.
Unlike unit tests, which check each function in isolation, system tests run all three steps together and verify the end‑to‑end workflow produces correct results.
We will need the follow imports in our test script:
How to write system tests
1. Identify the feature or workflow to test
Start by choosing the complete feature or workflow you want to validate.
In our case study, we only have a simple three-step pipeline - but more complex projects may have multiple intersecting workflows you want to focus on.
2. Define inputs and expected outputs
Think about realistic scenarios that cover:
- Clean/positive/success cases: standard inputs where everything should work correctly, including realistic variations in the inputs (e.g., different sample sizes, different distributions in the data input).
- Edge/extreme cases: unusual but plausible inputs (e.g., unusual sample sizes, boundary values).
- Error/negative/dirty cases: invalid inputs that should trigger errors.
3. Write tests for these scenarios
Clean case: typical data
This test confirms the workflow succeeds with standard inputs and produces correct summary statistics.
def test_workflow_success(tmp_path):
"""Complete workflow should calculate correct wait statistics."""
# Create test data with known values
test_data = pd.DataFrame({
"PATIENT_ID": ["p1", "p2", "p3"],
"ARRIVAL_DATE": ["2024-01-01", "2024-01-01", "2024-01-02"],
"ARRIVAL_TIME": ["0800", "0930", "1015"],
"SERVICE_DATE": ["2024-01-01", "2024-01-01", "2024-01-02"],
"SERVICE_TIME": ["0830", "1000", "1045"],
})
# Write test CSV
csv_path = tmp_path / "patients.csv"
test_data.to_csv(csv_path, index=False)
# Run complete workflow
df = import_patient_data(csv_path)
df = calculate_wait_times(df)
stats = summary_stats(df["waittime"])
# Verify the workflow produces correct results
# Expected wait times: 30, 30, 30 minutes
assert stats["mean"] == 30.0
assert stats["std_dev"] == 0.0
assert stats["ci_lower"] == 30.0
assert stats["ci_upper"] == 30.0test_that("complete workflow should calculate correct wait statistics", {
# Create test data with known values
test_data <- tibble::tibble(
PATIENT_ID = c("p1", "p2", "p3"),
ARRIVAL_DATE = c("2024-01-01", "2024-01-01", "2024-01-02"),
ARRIVAL_TIME = c("0800", "0930", "1015"),
SERVICE_DATE = c("2024-01-01", "2024-01-01", "2024-01-02"),
SERVICE_TIME = c("0830", "1000", "1045")
)
# Write test CSV
csv_path <- tempfile(fileext = ".csv")
readr::write_csv(test_data, csv_path)
# Run complete workflow
df <- import_patient_data(csv_path)
df <- calculate_wait_times(df)
stats <- summary_stats(df$waittime)
# Verify the workflow produces correct results
# Expected wait times: 30, 30, 30 minutes
expect_identical(stats$mean, 30)
expect_identical(stats$std_dev, 0)
expect_identical(stats$ci_lower, 30)
expect_identical(stats$ci_upper, 30)
})Clean case: variation using data with different distributions
This test confirms the workflow handles realistic variation in wait times.
def test_workflow_with_variation(tmp_path):
"""Workflow should correctly compute statistics for variable wait times."""
# Create test data with known wait times: 15, 30, 45 minutes
test_data = pd.DataFrame({
"PATIENT_ID": ["p1", "p2", "p3"],
"ARRIVAL_DATE": ["2024-01-01", "2024-01-01", "2024-01-01"],
"ARRIVAL_TIME": ["0800", "0900", "1000"],
"SERVICE_DATE": ["2024-01-01", "2024-01-01", "2024-01-01"],
"SERVICE_TIME": ["0815", "0930", "1045"],
})
csv_path = tmp_path / "patients.csv"
test_data.to_csv(csv_path, index=False)
# Run complete workflow
df = import_patient_data(csv_path)
df = calculate_wait_times(df)
stats = summary_stats(df["waittime"])
# Verify mean and standard deviation
assert stats["mean"] == 30
assert np.isclose(stats["std_dev"], 15)
# CI should be symmetric around mean for this small sample
assert stats["ci_lower"] < stats["mean"] < stats["ci_upper"]test_that("workflow should give correct statistics for variable wait times", {
# Create test data with known wait times: 15, 30, 45 minutes
test_data <- tibble::tibble(
PATIENT_ID = c("p1", "p2", "p3"),
ARRIVAL_DATE = c("2024-01-01", "2024-01-01", "2024-01-01"),
ARRIVAL_TIME = c("0800", "0900", "1000"),
SERVICE_DATE = c("2024-01-01", "2024-01-01", "2024-01-01"),
SERVICE_TIME = c("0815", "0930", "1045")
)
csv_path <- tempfile(fileext = ".csv")
readr::write_csv(test_data, csv_path)
# Run complete workflow
df <- import_patient_data(csv_path)
df <- calculate_wait_times(df)
stats <- summary_stats(df$waittime)
# Verify mean and standard deviation
expect_identical(stats$mean, 30)
expect_equal(stats$std_dev, 15, tolerance = 1e-8)
# CI should be symmetric around mean for this small sample
expect_lt(stats$ci_lower, stats$mean)
expect_gt(stats$ci_upper, stats$mean)
})Error case: invalid input data
This test confirms the workflow fails appropriately when given invalid data.
def test_missing_date_error(tmp_path):
"""Workflow should raise error when dates are missing."""
test_data = pd.DataFrame({
"PATIENT_ID": ["p1", "p2", "p3"],
"ARRIVAL_DATE": ["2024-01-01", "2024-01-01", "2024-01-01"],
"ARRIVAL_TIME": ["0800", "0900", "1000"],
"SERVICE_DATE": ["2024-01-01", pd.NaT, "2024-01-01"],
"SERVICE_TIME": ["0830", "1000", "1045"],
})
csv_path = tmp_path / "patients.csv"
test_data.to_csv(csv_path, index=False)
# Workflow should fail when calculating wait times with missing dates
df = import_patient_data(csv_path)
with pytest.raises(ValueError, match="time data"):
df = calculate_wait_times(df)test_that("workflow should raise error when dates are missing", {
test_data <- tibble::tibble(
PATIENT_ID = c("p1", "p2", "p3"),
ARRIVAL_DATE = c("2024-01-01", "2024-01-01", "2024-01-01"),
ARRIVAL_TIME = c("0800", "0900", "1000"),
SERVICE_DATE = c("2024-01-01", NA, "2024-01-01"),
SERVICE_TIME = c("0830", "1000", "1045")
)
csv_path <- tempfile(fileext = ".csv")
readr::write_csv(test_data, csv_path)
# Workflow should fail when calculating wait times with missing dates
# Will also have warning from ymd_hm() about returning NA
df <- import_patient_data(csv_path)
expect_warning(
expect_error(
calculate_wait_times(df),
regexp = "Failed to parse arrival or service datetimes"
),
regexp = "failed to parse"
)
})Running our example tests
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.2, pluggy-1.6.0
rootdir: /__w/stars-testing-intro/stars-testing-intro/examples/python_package
configfile: pyproject.toml
plugins: cov-7.0.0
collected 3 items
../examples/python_package/tests/test_system.py ... [100%]
============================== 3 passed in 0.99s ===============================
<ExitCode.OK: 0>
══ Testing test_system.R ═══════════════════════════════════════════════════════
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 0 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 1 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 2 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 3 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 4 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 5 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 6 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 7 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 8 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 9 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 10 ] Done!
When to stop writing tests
You cannot test everything. You’ve written enough tests when:
- Critical workflows are covered with at least one success case.
- Import variations and edge cases are tested.
- Key error conditions are verified.
Focus your testing effort on workflows that matter to your research and scenarios you’re likely to encounter in practice. You’re building confidence in your code, not trying to test every theoretical possibility.