def summary_stats(data):
"""
Calculate mean, standard deviation and 95% confidence interval (CI).
CI is calculated using the t-distribution, which is appropriate for
small samples and converges to the normal distribution as the sample
size increases.
Parameters
----------
data : pandas.Series
Data to use in the calculation.
Returns
-------
dict[str, float]
A dictionary with keys `mean`, `std_dev`, `ci_lower` and `ci_upper`.
Each value is a float, or `numpy.nan` if it can't be computed.
"""
# Drop missing values
data = data.dropna()
# Find number of observations
count = len(data)
# If there are no observations, then set all to NaN
if count == 0:
mean, std_dev, ci_lower, ci_upper = np.nan, np.nan, np.nan, np.nan
# If there are 1 or 2 observations, can do mean but not other statistics
elif count < 3:
mean = data.mean()
std_dev, ci_lower, ci_upper = np.nan, np.nan, np.nan
# With more than two observations, can calculate all...
else:
mean = data.mean()
std_dev = data.std()
# If there is no variation, then CI is equal to the mean
if np.var(data) == 0:
ci_lower, ci_upper = mean, mean
else:
# 95% CI based on the t-distribution
ci_lower, ci_upper = st.t.interval(
confidence=0.95,
df=count-1,
loc=mean,
scale=st.sem(data)
)
return {
"mean": mean,
"std_dev": std_dev,
"ci_lower": ci_lower,
"ci_upper": ci_upper
}Parameterising tests
There are many tools you can make use of when testing - one example is parametrising tests.
When you need to test the same logic with different inputs and expected outputs, you can parameterise your tests instead of writing repetitive test functions. This minimises code duplication and makes it easy to add new test cases.
Example: Testing summary_stats()
Let’s say we want to verify that our summary_stats() function works correctly for different datasets.
summary_stats()
#' Calculate mean, standard deviation and 95% confidence interval (CI).
#'
#' CI is calculated using the t-distribution, which is appropriate for
#' small samples and converges to the normal distribution as the sample
#' size increases.
#'
#' @param data Numeric vector of data to use in the calculation.
#'
#' @return A named list with elements `mean`, `std_dev`, `ci_lower` and
#' `ci_upper`. Each value is a numeric, or `NA` if it can't be computed.
#'
#' @export
summary_stats <- function(data) {
tibble::tibble(value = data) |>
dplyr::reframe(
n_complete = sum(!is.na(value)),
mean = mean(value, na.rm = TRUE),
std_dev = stats::sd(value, na.rm = TRUE),
ci_lower = {
if (n_complete < 2L) {
NA_real_
} else if (std_dev == 0 || is.na(std_dev)) {
mean # CI collapses to mean when no variation
} else {
stats::t.test(value)$conf.int[1L]
}
},
ci_upper = {
if (n_complete < 2L) {
NA_real_
} else if (std_dev == 0 || is.na(std_dev)) {
mean # CI collapses to mean when no variation
} else {
stats::t.test(value)$conf.int[2L]
}
}
) |>
dplyr::select(-n_complete) |>
as.list()
}We will need the following imports in our test script:
import pandas as pd
import pytest
from waitingtimes.patient_analysis import summary_statsInstead of writing separate test functions for each case, we can use pytest’s @pytest.mark.parametrize decorator:
def test_summary_stats(
data, expected_mean, expected_std, expected_ci_lower, expected_ci_upper
):
"""Running summary_stats returns expected values."""
res = summary_stats(pd.Series(data))
assert res["mean"] == pytest.approx(expected_mean, rel=5e-3)
assert res["std_dev"] == pytest.approx(expected_std, rel=5e-3)
assert res["ci_lower"] == pytest.approx(expected_ci_lower, rel=5e-3)
assert res["ci_upper"] == pytest.approx(expected_ci_upper, rel=5e-3)How it works
The @pytest.mark.parametrize decorator takes two arguments:
- Parameter names (as a string). These variable names will be passed to your test function. For example:
"data, expected_mean, expected_std, expected_ci_lower, expected_ci_upper"
- Test cases (as a list of tuples). Each tuple contains values for one test case. For example:
[
# Five value sample with known summary statistics
([1.0, 2.0, 3.0, 4.0, 5.0], 3.0, 1.58, 1.04, 4.96),
# No variation: CI collapse to mean
([5, 5, 5], 5, 0, 5, 5),
]
If any test case fails, pytest will clearly indicate which parameters were used, making debugging straightforward.
Instead of test_that(), we will use the function with_parameters_test_that() from Google’s patrick package. This lets us write our test code once, then provide a list of input/output combinations (the “cases”) that are run through the same test code.
The general pattern is:
patrick::with_parameters_test_that(
"Description of test",
{
# Test code using the parameters e.g., expect_...
},
patrick::cases(
list(input1 = 5L, input2 = 10L, output = 500L),
list(input1 = 6L, input2 = 11L, output = 600L)
)
)Each list() inside cases() defines one test case, with named elements matching the arguments used inside the code block.
For summary_stats, we can write our test as:
patrick::with_parameters_test_that(
"summary_stats returns expected values",
{
res <- summary_stats(data)
expect_equal(res$mean, expected_mean, tolerance = 5e-3)
expect_equal(res$std_dev, expected_std, tolerance = 5e-3)
expect_equal(res$ci_lower, expected_ci_lower, tolerance = 5e-3)
expect_equal(res$ci_upper, expected_ci_upper, tolerance = 5e-3)
},
patrick::cases(
# Five value sample with known summary statistics
list(
data = c(1.0, 2.0, 3.0, 4.0, 5.0),
expected_mean = 3.0,
expected_std = 1.58,
expected_ci_lower = 1.04,
expected_ci_upper = 4.96
),
# No variation: CI collapse to mean
list(
data = c(5, 5, 5),
expected_mean = 5,
expected_std = 0,
expected_ci_lower = 5,
expected_ci_upper = 5
)
)
)When this test runs, with_parameters_test_that() executes the code block once for each case, substituting in the corresponding data and expected values.
Running our test
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.2, pluggy-1.6.0
rootdir: /__w/hdruk_tests/hdruk_tests/examples/python_package
configfile: pyproject.toml
plugins: cov-7.0.0
collected 2 items
../examples/python_package/tests/test_intro_parametrised.py .. [100%]
============================== 2 passed in 0.90s ===============================
<ExitCode.OK: 0>
══ Testing test_intro_parametrised.R ═══════════════════════════════════════════
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 0 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 1 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 2 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 3 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 4 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 5 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 6 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 7 ]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 8 ] Done!