Parameterising tests

Choose your language:

There are many tools you can make use of when testing - one example is parametrising tests.

When you need to test the same logic with different inputs and expected outputs, you can parameterise your tests instead of writing repetitive test functions. This minimises code duplication and makes it easy to add new test cases.

Example: Testing `summary_stats()`

Let’s say we want to verify that our summary_stats() function works correctly for different datasets.

View summary_stats()

def summary_stats(data):
    """
    Calculate mean, standard deviation and 95% confidence interval (CI).

    CI is calculated using the t-distribution, which is appropriate for
    small samples and converges to the normal distribution as the sample
    size increases.

    Parameters
    ----------
    data : pandas.Series
        Data to use in the calculation.

    Returns
    -------
    dict[str, float]
        A dictionary with keys `mean`, `std_dev`, `ci_lower` and `ci_upper`.
        Each value is a float, or `numpy.nan` if it can't be computed.
    """
    # Drop missing values
    data = data.dropna()

    # Find number of observations
    count = len(data)

    # If there are no observations, then set all to NaN
    if count == 0:
        mean, std_dev, ci_lower, ci_upper = np.nan, np.nan, np.nan, np.nan

    # If there are 1 or 2 observations, can do mean but not other statistics
    elif count < 3:
        mean = data.mean()
        std_dev, ci_lower, ci_upper = np.nan, np.nan, np.nan

    # With more than two observations, can calculate all...
    else:
        mean = data.mean()
        std_dev = data.std()

        # If there is no variation, then CI is equal to the mean
        if np.var(data) == 0:
            ci_lower, ci_upper = mean, mean
        else:
            # 95% CI based on the t-distribution
            ci_lower, ci_upper = st.t.interval(
                confidence=0.95,
                df=count-1,
                loc=mean,
                scale=st.sem(data)
            )

    return {
        "mean": mean,
        "std_dev": std_dev,
        "ci_lower": ci_lower,
        "ci_upper": ci_upper
    }

#' Calculate mean, standard deviation and 95% confidence interval (CI).
#'
#' CI is calculated using the t-distribution, which is appropriate for
#' small samples and converges to the normal distribution as the sample
#' size increases.
#'
#' @param data Numeric vector of data to use in the calculation.
#'
#' @return A named list with elements `mean`, `std_dev`, `ci_lower` and 
#'   `ci_upper`. Each value is a numeric, or `NA` if it can't be computed.
#'
#' @export
summary_stats <- function(data) {
  tibble::tibble(value = data) |>
    dplyr::reframe(
      n_complete = sum(!is.na(value)),
      mean = mean(value, na.rm = TRUE),
      std_dev = stats::sd(value, na.rm = TRUE),
      ci_lower   = {
        if (n_complete < 2L) {
          NA_real_
        } else if (std_dev == 0 || is.na(std_dev)) {
          mean       # CI collapses to mean when no variation
        } else {
          stats::t.test(value)$conf.int[1L]
        }
      },
      ci_upper   = {
        if (n_complete < 2L) {
          NA_real_
        } else if (std_dev == 0 || is.na(std_dev)) {
          mean       # CI collapses to mean when no variation
        } else {
          stats::t.test(value)$conf.int[2L]
        }
      }
    ) |>
    dplyr::select(-n_complete) |>
    as.list()
}

We will need the following imports in our test script:

import pandas as pd
import pytest
from waitingtimes.patient_analysis import summary_stats

Instead of writing separate test functions for each case, we can use pytest’s @pytest.mark.parametrize decorator:

def test_summary_stats(
    data, expected_mean, expected_std, expected_ci_lower, expected_ci_upper
):
    """Running summary_stats returns expected values."""
    res = summary_stats(pd.Series(data))
    assert res["mean"] == pytest.approx(expected_mean, rel=5e-3)
    assert res["std_dev"] == pytest.approx(expected_std, rel=5e-3)
    assert res["ci_lower"] == pytest.approx(expected_ci_lower, rel=5e-3)
    assert res["ci_upper"] == pytest.approx(expected_ci_upper, rel=5e-3)

How it works

The @pytest.mark.parametrize decorator takes two arguments:

Parameter names (as a string). These variable names will be passed to your test function. For example:

"data, expected_mean, expected_std, expected_ci_lower, expected_ci_upper"

Test cases (as a list of tuples). Each tuple contains values for one test case. For example:

[
    # Five value sample with known summary statistics
    ([1.0, 2.0, 3.0, 4.0, 5.0], 3.0, 1.58, 1.04, 4.96),
    # No variation: CI collapse to mean
    ([5, 5, 5], 5, 0, 5, 5),
]

If any test case fails, pytest will clearly indicate which parameters were used, making debugging straightforward.

Instead of test_that(), we will use the function with_parameters_test_that() from Google’s patrick package. This lets us write our test code once, then provide a list of input/output combinations (the “cases”) that are run through the same test code.

The general pattern is:

patrick::with_parameters_test_that(
  "Description of test",
  {
    # Test code using the parameters e.g., expect_...
  },
  patrick::cases(
    list(input1 = 5L, input2 = 10L, output = 500L),
    list(input1 = 6L, input2 = 11L, output = 600L)
  )
)

Each list() inside cases() defines one test case, with named elements matching the arguments used inside the code block.

For summary_stats, we can write our test as:

patrick::with_parameters_test_that(
  "summary_stats returns expected values",
  {
    res <- summary_stats(data)

    expect_equal(res$mean, expected_mean, tolerance = 5e-3)
    expect_equal(res$std_dev, expected_std, tolerance = 5e-3)
    expect_equal(res$ci_lower, expected_ci_lower, tolerance = 5e-3)
    expect_equal(res$ci_upper, expected_ci_upper, tolerance = 5e-3)
  },
  patrick::cases(
    # Five value sample with known summary statistics
    list(
      data = c(1.0, 2.0, 3.0, 4.0, 5.0),
      expected_mean = 3.0,
      expected_std = 1.58,
      expected_ci_lower = 1.04,
      expected_ci_upper = 4.96
    ),
    # No variation: CI collapse to mean
    list(
      data = c(5, 5, 5),
      expected_mean = 5,
      expected_std = 0,
      expected_ci_lower = 5,
      expected_ci_upper = 5
    )
  )
)

When this test runs, with_parameters_test_that() executes the code block once for each case, substituting in the corresponding data and expected values.

Running our test

Test output

============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.2, pluggy-1.6.0
rootdir: /__w/hdruk_tests/hdruk_tests/examples/python_package
configfile: pyproject.toml
plugins: cov-7.0.0
collected 2 items

../examples/python_package/tests/test_intro_parametrised.py ..           [100%]

============================== 2 passed in 0.90s ===============================
<ExitCode.OK: 0>


══ Testing test_intro_parametrised.R ═══════════════════════════════════════════

[ FAIL 0 | WARN 0 | SKIP 0 | PASS 0 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 1 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 2 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 3 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 4 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 5 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 6 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 7 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 8 ] Done!

Example: Testing summary_stats()

How it works

Running our test

Example: Testing `summary_stats()`