HDR UK Futures HDR UK Futures Testing in Research Workflows
  1. Introduction to writing and running tests
  2. How to write a basic test

This site contains materials for the testing module on HDR UK’s RSE001 Research Software Engineering training course. It was developed as part of the STARS project.

  • When and why to run tests?
  • Case study
  • Introduction to writing and running tests
    • How to write a basic test
    • How to run tests
    • Parameterising tests
  • Types of test
    • Unit tests
    • Functional tests
    • Back tests
  • What was the point? Let’s break it and see!
  • Test coverage
  • Running tests via GitHub actions
  • Example repositories
  1. Introduction to writing and running tests
  2. How to write a basic test

How to write a basic test

Choose your language:  


On this page, we explain how to write a basic automated test in Python using pytest.

What is pytest?

Pytest is a popular framework for testing in Python, widely used in software development and data science.

You should install the pytest package into your environment from either conda or PyPI:

conda install pytest
# or
pip install pytest

What a pytest test looks like

In pytest, any function whose name starts with test_ is treated as a test. Inside the function you write one or more assert statements. If an assertion fails, pytest will return an error message explaining what went wrong.

For example:

import pytest


def test_example():
    """Simple test confirming 2 + 2 = 4"""
    result = 2 + 2
    assert result == 4

On this page, we explain how to write a basic automated test in R using testthat.

testthat

testthat is a popular framework for testing in R, widely used in software development and data science.

You should install the testthat package into your environment from CRAN.

install.packages("testthat")

If you are structuring your research as a package, we suggest also installing usethis and running:

usethis::use_testthat()

What a testthat test looks like

Tests are created using test_that(). They are built around expectations like expect_true(), expect_false(), expect_equal(), expect_error(), and others (see package index for more). If an expectation fails, testthat will return an error message explaining what went wrong.

For example:

library(testthat)


test_that("2 add 2 equals 4", {
  result <- 2L + 2L
  expect_equal(result, 4L)
})
Test passed with 1 success 🥇.

A simple test for summary_stats

Here is a minimal example using the summary_stats function from the case study.

NoteView summary_stats()
def summary_stats(data):
    """
    Calculate mean, standard deviation and 95% confidence interval (CI).

    CI is calculated using the t-distribution, which is appropriate for
    small samples and converges to the normal distribution as the sample
    size increases.

    Parameters
    ----------
    data : pandas.Series
        Data to use in the calculation.

    Returns
    -------
    dict[str, float]
        A dictionary with keys `mean`, `std_dev`, `ci_lower` and `ci_upper`.
        Each value is a float, or `numpy.nan` if it can't be computed.
    """
    # Drop missing values
    data = data.dropna()

    # Find number of observations
    count = len(data)

    # If there are no observations, then set all to NaN
    if count == 0:
        mean, std_dev, ci_lower, ci_upper = np.nan, np.nan, np.nan, np.nan

    # If there are 1 or 2 observations, can do mean but not other statistics
    elif count < 3:
        mean = data.mean()
        std_dev, ci_lower, ci_upper = np.nan, np.nan, np.nan

    # With more than two observations, can calculate all...
    else:
        mean = data.mean()
        std_dev = data.std()

        # If there is no variation, then CI is equal to the mean
        if np.var(data) == 0:
            ci_lower, ci_upper = mean, mean
        else:
            # 95% CI based on the t-distribution
            ci_lower, ci_upper = st.t.interval(
                confidence=0.95,
                df=count-1,
                loc=mean,
                scale=st.sem(data)
            )

    return {
        "mean": mean,
        "std_dev": std_dev,
        "ci_lower": ci_lower,
        "ci_upper": ci_upper
    }
NoteView summary_stats()
#' Calculate mean, standard deviation and 95% confidence interval (CI).
#'
#' CI is calculated using the t-distribution, which is appropriate for
#' small samples and converges to the normal distribution as the sample
#' size increases.
#'
#' @param data Numeric vector of data to use in the calculation.
#'
#' @return A named list with elements `mean`, `std_dev`, `ci_lower` and 
#'   `ci_upper`. Each value is a numeric, or `NA` if it can't be computed.
#'
#' @export
summary_stats <- function(data) {
  tibble::tibble(value = data) |>
    dplyr::reframe(
      n_complete = sum(!is.na(value)),
      mean = mean(value, na.rm = TRUE),
      std_dev = stats::sd(value, na.rm = TRUE),
      ci_lower   = {
        if (n_complete < 2L) {
          NA_real_
        } else if (std_dev == 0 || is.na(std_dev)) {
          mean       # CI collapses to mean when no variation
        } else {
          stats::t.test(value)$conf.int[1L]
        }
      },
      ci_upper   = {
        if (n_complete < 2L) {
          NA_real_
        } else if (std_dev == 0 || is.na(std_dev)) {
          mean       # CI collapses to mean when no variation
        } else {
          stats::t.test(value)$conf.int[2L]
        }
      }
    ) |>
    dplyr::select(-n_complete) |>
    as.list()
}

For a single value, the function should return that value as the mean and NaN for the other statistics, because there is not enough data to define a standard deviation or confidence interval.

import pandas as pd
from waitingtimes.patient_analysis import summary_stats
def test_summary_stats_single_value():
    """Running summary_stats on a single value should only return mean."""
    data = pd.Series([10])
    res = summary_stats(data)
    assert res["mean"] == 10
    assert pd.isna(res["std_dev"])
    assert pd.isna(res["ci_lower"])
    assert pd.isna(res["ci_upper"])
test_that("running summary_stats on a single value only returns the mean", {
  data <- c(10)
  res <- summary_stats(data)

  expect_identical(res$mean, 10)
  expect_true(is.na(res$std_dev))
  expect_true(is.na(res$ci_lower))
  expect_true(is.na(res$ci_upper))
})
Case study
How to run tests
 
  • Code licence: MIT. Text licence: CC-BY-SA 4.0.