Validation Primer

About This Primer

This page provides a high-level introduction to the science validation work carried out by the UKSRC-ST team. It is intended for stakeholders, collaborators, and contributors who want to understand what validation is, why it matters, and how it is implemented in the context of precision science with the SKA and its precursors.

Whether you are a scientist, instrument team member, or programme manager, this primer aims to clarify how validation supports your work – from model reliability to science decision-making.

For a more detailed discussion of validation in the context of precision astrophysics and cosmology see e.g. Aguirre+22, Sims+25a, b and references therein.

πŸ” 1. What is Validation?

In the context of radio astronomy and big-data science, validation is the process of determining whether our models, pipelines, and simulations can be trusted to deliver accurate and reliable scientific results.

Put simply:

Validation ensures that the tools we use to interpret data actually work for the science we care about.

Validation plays a different role from verification:

Term

Question Answered

Example

Verification

β€œDid we build the system right?”

Does the power spectrum pipeline run without errors on test data?

Validation

β€œDid we build the right system (for science)?”

Can we recover a known cosmological 21-cm signal from realistic mock data?

In the SKA context, this means asking:

  • Is our knowledge of the instrument (e.g. beam, layout, calibration) good enough for the highest precision science cases?

  • Are our simulations and pipelines accurate representations of the real sky and telescope?

  • Can we confidently go beyond the current observational and modelling frontier to detect and interpret never-before-seen faint signals?

Because real SKA and precursor data are complex and contaminated with systematics (e.g. foregrounds, calibration errors, instrumental effects), validation gives us a principled way to:

  • Quantify our confidence in detections or upper limits.

  • Identify and mitigate potential sources of bias.

  • Set requirements on instrument performance and data quality.

Key takeaway:

We validate to ensure that precision astrophysics is built on solid foundations. Without validation, even sophisticated analyses can lead to incorrect or misleading conclusions.


🎯 2. Why is Validation Critical for SKA Science?

Modern radio cosmology aims to detect some of the faintest signals ever measured – such as the redshifted 21-cm line of neutral hydrogen from the early Universe – using increasingly complex instruments and analysis pipelines. In this regime, it is the meticulous characterisation of the instrument and high fidelity modelling of the data that presents the greatest challenge, not intrinsic instrumental sensitivity. Precision science requires precision validation.

🚨 The Challenge

Many of the most important SKA and precursor science goals, such as probing the Epoch of Reionization (EoR), measuring Baryon Acoustic Oscillations or constraining properties of dark matter and gravity, rely on:

  • Detecting extremely weak signals (e.g. the 21-cm power spectrum is >105Γ— fainter than foregrounds)

  • Modelling instruments with complex, evolving systematics (e.g. beam chromaticity, mutual coupling)

  • Separating multiple overlapping signal components in the data (e.g. cosmological signal, sky foregrounds, instrument noise, systematics)

These challenges introduce failure modes that are hard to detect without formal validation:

Failure Mode

Impact on Science

Spurious detections from unmodelled systematics

False positives or biased inferences

Overfitting noise or foregrounds

Signal suppression and loss of sensitivity

Mischaracterised instrument response

Incorrect astrophysical parameter estimation

Foreground leakage into cosmological modes

Invalid upper limits, contaminated measurements


πŸ§ͺ The Role of Validation

Validation mitigates these risks by testing whether our models and analysis pipelines are good enough for the science goal at hand. For SKA and precursor science, this involves:

  • Simulating realistic observations, including known sky, instrument, and noise models

  • Processing those simulations through the full analysis pipeline

  • Comparing the output to the known input to check for bias, loss, or spurious structure

These tests can be done at multiple levels:

  • Module-level: Does this beam model correctly simulate primary beam effects?

  • Pipeline-level: Does the entire pipeline recover the input Signal of Interest (SOI) within expected uncertainty?

  • Bayesian model-level: Does our inference framework favour the correct model when presented with SOI-free data?


πŸ“Œ Why Stakeholders Should Care

Validation provides evidence-based confidence that:

  • Instrument specs (e.g. beam FWHM accuracy, antenna layout) are sufficient

  • Science pipelines are not introducing systematic biases

  • Reported results – such as detections or upper limits – are credible

It also helps set priorities:
If validation shows a particular systematic effect (e.g. beam mischaracterization) dominates the science error budget, this can guide investments in better measurements or modelling.

Without validation, we’re building science conclusions on untested assumptions. With validation, we make those assumptions testable – and trustworthy.


πŸ§ͺ 3. Types of Validation We Perform

Validation within the Science Validation Tooling (UKSRC-ST) team is structured across multiple layers of the analysis pipeline, from individual components to end-to-end science inferences. This layered approach ensures that each part of the system – and their interactions – are tested for reliability, accuracy, and scientific credibility.



🧩 A. Module-Level Validation

β€œDoes each component do what it claims to do?”

We perform controlled, unit-style validation on individual simulation and modelling tools. These are often benchmarked against analytic solutions or cross-validated between simulators.

Examples:

  • Testing pyuvsim and fftvis against known visibility solutions (e.g. point sources, Gaussian beams)

  • Verifying beam model perturbations produce expected changes in effective FWHM

  • Cross-checks between OSKAR, pyuvsim, and matvis for sky model consistency


πŸ” B. Pipeline-Level (End-to-End) Validation

β€œIf we feed in a known sky and instrument model, do we recover the expected science result?”

End-to-end tests allow us to measure the overall performance of the pipeline under realistic conditions. These simulations mimic the full complexity of the observing process, including sky foregrounds, instrument effects, and noise.

Examples:

  • Injecting a known 21-cm signal into a realistic HERA simulation and checking whether it is recovered

  • Measuring signal loss, bias, or foreground leakage across the pipeline

  • Comparing different calibration schemes to assess robustness of power spectrum estimates

This approach was central to the HERA Phase I validation effort (Aguirre+2022), which uncovered subtle scale-independent signal loss and informed the development of a robust analysis framework.


πŸ“Š C. Model Validation in a Bayesian Framework

β€œIs the model structure itself trustworthy for science inference?”

We apply formal Bayesian model comparison techniques to determine whether models can be trusted to yield unbiased science results. This includes:

βœ… Bayes-Factor-Based Model Comparison (BFBMC)

Used to rank models based on their statistical consistency with the data – especially useful for comparing cosmological models under similar assumptions.

🚫 Limitations of BFBMC

BFBMC alone cannot always detect when signal components are biased by interactions with unmodelled systematics (e.g. residual foreground structure leaking into cosmological modes).


🧰 D. BaNTER: Bayesian Null Test Evidence Ratio

β€œCan this model be trusted before seeing the signal?”

We use the BaNTER validation framework (Sims+2025) to assess the risk of biased inference before using a model on real data. It compares how well composite models fit signal-free validation data and identifies whether the nuisance model is sufficient.

Key idea:

If a composite model fits signal-free data better than the nuisance model alone, it may falsely attribute residuals to the signal – leading to bias.

BaNTER provides a model validation prior, which is then used to weight model comparisons and ensure robust science inference.


πŸ“ˆ Summary Table


Type of Validation

Goal

Tools

Module-level

Test correctness of simulation components

pyuvsim, fftvis, analytic benchmarks

Pipeline-level

Recover known inputs from full simulations

End-to-end simulations, power spectra

Bayesian model comparison

Rank models based on evidence (predictivity)

BayesEoR, PolyChord

BaNTER

Assess model credibility before inference

Null tests, posterior odds adjustment



πŸ› οΈ 4. Our Approach

Our validation strategy is modular, layered, and Bayesian. It combines domain-specific simulation tools with statistical rigor to build stakeholder confidence in the reliability of our science pipelines.

🧱 Modular Testing and Integration

We validate individual components – such as simulators, sky models, and instrument configurations – before integrating them into full pipelines.

Benefits:

  • Easier to identify the source of bugs or biases

  • Encourages reuse and reproducibility

  • Scales to increasingly complex science cases


πŸ§ͺ End-to-End Simulations

We simulate full observing scenarios using realistic sky models, beam models, array layouts, and thermal noise. These simulations are processed using the exact same pipeline used for real data.

🎯 If an SKA or precursor pipeline cannot recover a known input from a simulation, it cannot be trusted to recover unknown inputs from real data.

Key tools:

  • pyuvsim, fftvis, OSKAR (for visibility simulation)

  • BayesEoR, PolyChord (for Bayesian inference)

  • Custom parameter sweep + perturbation frameworks (e.g. beam FWHM studies)


🧠 Bayesian Inference and Model Validation

Rather than relying on visual inspection or point estimates, we use Bayesian inference to:

  • Quantify parameter uncertainties

  • Compare competing models via evidence

  • Identify risk of bias due to model mis-specification (e.g. foreground leakage, calibration errors)

We extend this with the BaNTER framework, which allows us to validate models on signal-free data before using them for inference.


πŸ’» Our Infrastructure

Validation requires reproducibility, scalability, and traceability. We use a modern, science-friendly stack:

Component

Details

Computing

HPC access via Azimuth and Galahad (incl. CPU and GPU nodes for accelerated computation )

Versioning

All validation code tracked in UKSRC GitHub repos (e.g. the ValSKA repository)

Workflow

Notebooks for exploratory work, containers (Docker/Singularity) for reproducibility

Agile Tracking

Work organized via JIRA Epics and Stories (e.g. SKAO Jiraad75ab71-1245-3349-8713-12bcc32bca7cSAPP-146)


πŸ”„ Iteration and Feedback

We design validation activities to be iterative:

  • Early-stage simulations help set science and calibration requirements

  • Mid-stage validations feed into model refinement

  • Final-stage validations provide confidence before release of science results

Feedback loops with instrument teams, sky model developers, and domain scientists are key to this approach.


πŸ“Œ In Practice…

Each analysis we validate goes through a tailored process:

  • We define the signal of interest

  • Identify the relevant nuisance models (e.g. beam, FG, calibration)

  • Build a suite of simulations

  • Validate individual components and the pipeline as a whole

  • Apply Bayesian model comparison and BaNTER

  • Document performance, signal loss, and bias

This structured process ensures science credibility is not left to chance.


🧠 5. Key Concepts

Validation uses a specific vocabulary to describe how we assess the credibility of models, pipelines, and inferences. Here we define the most important concepts that underpin our methodology.


🎯 Signal of Interest (SOI)

The specific astrophysical signal we are trying to detect or constrain.
In our context, this is often the 21-cm signal from the Epoch of Reionization or Cosmic Dawn.

Everything in the validation process is designed to ensure that our measurement of the SOI is accurate, unbiased, and robust.


🌌 Composite Model

A model that describes multiple components in the data, such as:

  • the SOI

  • astrophysical foregrounds

  • instrumental effects

  • noise

Example:

A 21-cm power spectrum model that includes both foreground emission and beam effects.

Composite models are required for realism – but they can hide biases if components interact poorly or absorb each other’s errors.


πŸ”Ž Predictivity vs. Accuracy


Term

Definition

Predictivity

How well a model fits the data overall (i.e. total evidence)

Accuracy

Whether individual components of the model correctly describe their signals

A model can be predictive but not physically accurate – fitting the data well but getting the SOI wrong due to unmodelled systematics.


βš–οΈ Bayesian Evidence and Bayes Factors

Bayesian evidence quantifies how well a model fits the data on average over its parameter space.
The Bayes factor compares the evidence of two models:
  • Value >1: model A preferred

  • Value <1: model B preferred

Bayes factors are used in Bayesian model comparison (BFBMC), but this alone doesn’t guarantee accurate SOI recovery (see next concept).


🚫 Failure of Bayes-Factor-Only Comparison

Bayesian model comparison can mislead if:

  • A model fits the data well in aggregate

  • But does so by misattributing signal between components

This is particularly dangerous when the SOI is sub-dominant (e.g. buried beneath foregrounds).


βœ… BaNTER (Bayesian Null Test Evidence Ratio)

BaNTER is a validation framework designed to detect when a model is at risk of producing biased SOI estimates.

It works by:

  1. Running a null test on simulated data that contains no SOI

  2. Comparing the evidence of the nuisance-only model vs. the full composite model

  3. If the composite model fits the SOI-free data better, it may be absorbing residual structure into the SOI – a red flag for bias

BaNTER provides a prior penalty against risky models before they are used for inference.


🧰 Validated Posterior Odds

This combines:

  • Bayes factor (a posteriori evidence from real data)

  • BaNTER null test (a priori model credibility)

Result: a robust, bias-aware model selection criterion, improving confidence in the final inference.


Validation Workflow


The diagram highlights three main stages – model specification, model validation, and data analysis – and distinguishes between a standard unvalidated Bayesian inference workflow (dashed red lines) and the enhanced BaNTER-validated approach (solid black lines). This formalises the process of introducing prior credibility assessments before posterior model comparison, improving robustness of signal inference. (Figure credit: Sims+2025)



These concepts form the intellectual backbone of our validation framework – combining physical realism, statistical rigour, and practical insight to support trustworthy SKA science.



🧰 6. Tools We Use

Robust validation requires powerful tools that can model, simulate, and statistically analyse SKA-class data and systems. The UKSRC-ST team leverages a suite of open-source and in-house tools, tailored to the unique challenges of 21-cm cosmology and interferometric calibration.


πŸ–₯️ Simulation Tools

These generate realistic data for testing pipelines, evaluating sensitivity, and injecting known signals.

Tool

Purpose

pyuvsim

High-accuracy visibility simulator based on the Measurement Equation

OSKAR

Full-featured radio telescope simulator with beam and ionosphere support

fftvis

High-performance visibility simulator using non-uniform FFT (NUFFT)

These tools allow us to simulate full datasets with known ground truth, including beams, sky models, noise, and layout.


πŸ“ˆ Bayesian Inference Tools

These are used to fit models to simulated (or real) data, and evaluate the evidence for different scientific hypotheses.

Tool

Purpose

BayesEoR

Bayesian inference framework for the 21-cm power spectrum

PolyChord

Nested sampler used for evidence computation

BaNTER

Framework for null-test-based model validation

These tools enable us to perform:

  • Posterior estimation

  • Model comparison

  • Validation via posterior odds


πŸ§ͺ Validation Frameworks and Perturbation Studies

Science recovery requires knowledge of the instrument. This knowledge is necessarily incomplete (e.g. element patterns and signal chain characterised to x% and y% accuracy, with x and y < 100). Thus, the question naturally arises: is our knowledge sufficient for precision cosmology? To answer this, we can test sensitivity to systematic uncertainties by perturbing model components:

  • Beam FWHM perturbation framework
    (e.g. ValSKA GitHub repo)
  • Instrument layout variation

  • Foreground modelling variations

  • Sky model incompleteness tests

These are used to probe how errors in inputs propagate to science outputs – and what level of precision is required to keep them under control.


πŸ–§ Infrastructure


Platform

Purpose

Azimuth

UKSRC GPU-enabled local workstation (setup for validation)

Galahad

UoM HPC platform (batch + interactive modes)

We use Singularity and Docker containers for reproducible software environments, including custom builds for BayesEoR, pyuvsim, and PolyChord.


πŸ—ƒοΈ Version Control and Collaboration


System

Use

GitHub

Code repositories (e.g. simulations, notebooks, validation tooling)

JIRA

Epic and story tracking

Confluence

Documentation, meeting minutes, validation primer

Slack

Team and stakeholder communications

Miro

Visual planning, feature mapping


This toolset is modular, extensible, and FAIR-aligned, allowing us to scale from tightly controlled validation cases to complex, high-dimensional SKA use cases.



🀝 7. How Stakeholders Benefit

While the methods we use are technical, the value of validation is strategic. It ensures that science with the SKA and its precursors is credible, accurate, and actionable.

Here’s how validation supports different stakeholder groups:


πŸ”¬ For Science Leads and Analysts

  • Confidence in Results
    Know that reported detections or upper limits are statistically sound, not artefacts of systematics.
  • Model Selection with Rigor
    Use validated posterior odds to compare competing theories or astrophysical models.
  • Targeted Model Refinement
    Identify where modelling effort is most needed (e.g. better beams vs. better sky models).

πŸ› οΈ For Instrumentation and Calibration Teams

  • Requirements Derivation
    Quantify how precisely components (e.g. beam FWHM, antenna layout) must be known to support specific science goals.
  • Feedback Loops
    Understand how instrument choices affect scientific accuracy, even before deployment.

πŸ’Ό For Programme Managers and Funders

  • Risk Reduction
    Validation reduces the risk of costly reanalysis, spurious claims, or retraction of results.
  • Investment Prioritisation
    Enables data-driven decisions on where to focus calibration, simulation, or modelling efforts.
  • Impact Tracking
    Provides evidence of progress toward precision cosmology readiness – key for reporting and review cycles.

πŸ“£ Stakeholder Benefits Summary

Why This Matters to Stakeholders

  • Validation builds trust in science outputs before real data even arrives

  • It provides quantitative answers to questions like: β€œIs our beam model good enough?” or β€œWhat’s the risk of false detection?”

  • Helps prioritise which instrument parameters or modelling choices matter most

  • Makes UKSRC science contributions more credible and internationally competitive



βœ… In short: Validation turns β€œwe hope this works” into β€œwe have evidence that it does.”


πŸ”­ 8. Looking Ahead

Validation is not a one-off task – it is a continuous process that evolves alongside our instruments, software, and science goals. As we scale toward full SKA operations, the UKSRC-ST team will work on ensuring that validation keeps pace with the growing complexity and ambition of the science.


πŸ“ˆ Scaling to SKA-Class Complexity

  • Bigger Simulations
    We are expanding to simulate full SKA-class observations, with thousands of antennas and high-resolution sky models.
  • More Science Cases
    After establishing pipelines for 21-cm cosmology, we aim to extend validation to other domains such as pulsar timing, continuum imaging, and intensity mapping.
  • Cross-Validation Across Tools
    Ongoing efforts to compare outputs across multiple simulators (e.g. pyuvsim, fftvis, OSKAR) will help identify hidden assumptions and strengthen robustness.

βš™οΈ Automation and Reproducibility

  • FAIR-Aligned Pipelines
    We’re building validation workflows that are Findable, Accessible, Interoperable, and Reproducible.
  • CI/CD for Science
    Moving toward containerized, automated validation tests that run as part of our development cycle.
  • Traceable Deliverables
    Versioned validation reports will allow stakeholders to track performance over time.

🧠 Deeper Integration with Inference

  • Validated Posterior Odds
    Future analyses can directly incorporate model credibility (via BaNTER) into parameter estimation.
  • Uncertainty-Aware Decision Making
    Helping scientists and instrument teams understand not just what a result is, but how reliable it is – and why.

πŸ‘₯ Strengthening Stakeholder Engagement

  • Transparent Reporting
    We’ll continue to publish validation results, failure modes, and assumptions clearly and accessibly.
  • Collaborative Requirements Setting
    Validation outputs should inform calibration requirements, observing strategies, and data processing plans.
  • Training and Onboarding Support
    Materials and tutorials will help new team members, scientists, and partners understand and contribute to validation efforts.

πŸš€ The future of SKA science depends on getting the analysis right – and that starts with validation.

By investing in rigorous, scalable validation now, we’re laying the foundation for credible, world-leading science with SKA and its pathfinders.