Validation Primerο
About This Primer
This page provides a high-level introduction to the science validation work carried out by the UKSRC-ST team. It is intended for stakeholders, collaborators, and contributors who want to understand what validation is, why it matters, and how it is implemented in the context of precision science with the SKA and its precursors.
Whether you are a scientist, instrument team member, or programme manager, this primer aims to clarify how validation supports your work β from model reliability to science decision-making.
For a more detailed discussion of validation in the context of precision astrophysics and cosmology see e.g. Aguirre+22, Sims+25a, b and references therein.
π 1. What is Validation?ο
In the context of radio astronomy and big-data science, validation is the process of determining whether our models, pipelines, and simulations can be trusted to deliver accurate and reliable scientific results.
Put simply:
Validation ensures that the tools we use to interpret data actually work for the science we care about.
Validation plays a different role from verification:
Term |
Question Answered |
Example |
|---|---|---|
Verification |
βDid we build the system right?β |
Does the power spectrum pipeline run without errors on test data? |
Validation |
βDid we build the right system (for science)?β |
Can we recover a known cosmological 21-cm signal from realistic mock data? |
In the SKA context, this means asking:
Is our knowledge of the instrument (e.g. beam, layout, calibration) good enough for the highest precision science cases?
Are our simulations and pipelines accurate representations of the real sky and telescope?
Can we confidently go beyond the current observational and modelling frontier to detect and interpret never-before-seen faint signals?
Because real SKA and precursor data are complex and contaminated with systematics (e.g. foregrounds, calibration errors, instrumental effects), validation gives us a principled way to:
Quantify our confidence in detections or upper limits.
Identify and mitigate potential sources of bias.
Set requirements on instrument performance and data quality.
Key takeaway:
We validate to ensure that precision astrophysics is built on solid foundations. Without validation, even sophisticated analyses can lead to incorrect or misleading conclusions.
π― 2. Why is Validation Critical for SKA Science?ο
Modern radio cosmology aims to detect some of the faintest signals ever measured β such as the redshifted 21-cm line of neutral hydrogen from the early Universe β using increasingly complex instruments and analysis pipelines. In this regime, it is the meticulous characterisation of the instrument and high fidelity modelling of the data that presents the greatest challenge, not intrinsic instrumental sensitivity. Precision science requires precision validation.
π¨ The Challengeο
Many of the most important SKA and precursor science goals, such as probing the Epoch of Reionization (EoR), measuring Baryon Acoustic Oscillations or constraining properties of dark matter and gravity, rely on:
Detecting extremely weak signals (e.g. the 21-cm power spectrum is >105Γ fainter than foregrounds)
Modelling instruments with complex, evolving systematics (e.g. beam chromaticity, mutual coupling)
Separating multiple overlapping signal components in the data (e.g. cosmological signal, sky foregrounds, instrument noise, systematics)
These challenges introduce failure modes that are hard to detect without formal validation:
Failure Mode |
Impact on Science |
|---|---|
Spurious detections from unmodelled systematics |
False positives or biased inferences |
Overfitting noise or foregrounds |
Signal suppression and loss of sensitivity |
Mischaracterised instrument response |
Incorrect astrophysical parameter estimation |
Foreground leakage into cosmological modes |
Invalid upper limits, contaminated measurements |
π§ͺ The Role of Validationο
Validation mitigates these risks by testing whether our models and analysis pipelines are good enough for the science goal at hand. For SKA and precursor science, this involves:
Simulating realistic observations, including known sky, instrument, and noise models
Processing those simulations through the full analysis pipeline
Comparing the output to the known input to check for bias, loss, or spurious structure
These tests can be done at multiple levels:
Module-level: Does this beam model correctly simulate primary beam effects?
Pipeline-level: Does the entire pipeline recover the input Signal of Interest (SOI) within expected uncertainty?
Bayesian model-level: Does our inference framework favour the correct model when presented with SOI-free data?
π Why Stakeholders Should Careο
Validation provides evidence-based confidence that:
Instrument specs (e.g. beam FWHM accuracy, antenna layout) are sufficient
Science pipelines are not introducing systematic biases
Reported results β such as detections or upper limits β are credible
Without validation, weβre building science conclusions on untested assumptions. With validation, we make those assumptions testable β and trustworthy.
π§ͺ 3. Types of Validation We Performο
Validation within the Science Validation Tooling (UKSRC-ST) team is structured across multiple layers of the analysis pipeline, from individual components to end-to-end science inferences. This layered approach ensures that each part of the system β and their interactions β are tested for reliability, accuracy, and scientific credibility.
π§© A. Module-Level Validationο
βDoes each component do what it claims to do?β
We perform controlled, unit-style validation on individual simulation and modelling tools. These are often benchmarked against analytic solutions or cross-validated between simulators.
Examples:
Testing
pyuvsimandfftvisagainst known visibility solutions (e.g. point sources, Gaussian beams)Verifying beam model perturbations produce expected changes in effective FWHM
Cross-checks between
OSKAR,pyuvsim, andmatvisfor sky model consistency
π B. Pipeline-Level (End-to-End) Validationο
βIf we feed in a known sky and instrument model, do we recover the expected science result?β
End-to-end tests allow us to measure the overall performance of the pipeline under realistic conditions. These simulations mimic the full complexity of the observing process, including sky foregrounds, instrument effects, and noise.
Examples:
Injecting a known 21-cm signal into a realistic HERA simulation and checking whether it is recovered
Measuring signal loss, bias, or foreground leakage across the pipeline
Comparing different calibration schemes to assess robustness of power spectrum estimates
This approach was central to the HERA Phase I validation effort (Aguirre+2022), which uncovered subtle scale-independent signal loss and informed the development of a robust analysis framework.
π C. Model Validation in a Bayesian Frameworkο
βIs the model structure itself trustworthy for science inference?β
We apply formal Bayesian model comparison techniques to determine whether models can be trusted to yield unbiased science results. This includes:
β Bayes-Factor-Based Model Comparison (BFBMC)ο
Used to rank models based on their statistical consistency with the data β especially useful for comparing cosmological models under similar assumptions.
π« Limitations of BFBMCο
BFBMC alone cannot always detect when signal components are biased by interactions with unmodelled systematics (e.g. residual foreground structure leaking into cosmological modes).
π§° D. BaNTER: Bayesian Null Test Evidence Ratioο
βCan this model be trusted before seeing the signal?β
We use the BaNTER validation framework (Sims+2025) to assess the risk of biased inference before using a model on real data. It compares how well composite models fit signal-free validation data and identifies whether the nuisance model is sufficient.
Key idea:
If a composite model fits signal-free data better than the nuisance model alone, it may falsely attribute residuals to the signal β leading to bias.
BaNTER provides a model validation prior, which is then used to weight model comparisons and ensure robust science inference.
π Summary Tableο
Type of Validation |
Goal |
Tools |
|---|---|---|
Module-level |
Test correctness of simulation components |
|
Pipeline-level |
Recover known inputs from full simulations |
End-to-end simulations, power spectra |
Bayesian model comparison |
Rank models based on evidence (predictivity) |
|
BaNTER |
Assess model credibility before inference |
Null tests, posterior odds adjustment |
π οΈ 4. Our Approachο
Our validation strategy is modular, layered, and Bayesian. It combines domain-specific simulation tools with statistical rigor to build stakeholder confidence in the reliability of our science pipelines.
π§± Modular Testing and Integrationο
We validate individual components β such as simulators, sky models, and instrument configurations β before integrating them into full pipelines.
Benefits:
Easier to identify the source of bugs or biases
Encourages reuse and reproducibility
Scales to increasingly complex science cases
π§ͺ End-to-End Simulationsο
We simulate full observing scenarios using realistic sky models, beam models, array layouts, and thermal noise. These simulations are processed using the exact same pipeline used for real data.
π― If an SKA or precursor pipeline cannot recover a known input from a simulation, it cannot be trusted to recover unknown inputs from real data.
Key tools:
pyuvsim,fftvis,OSKAR(for visibility simulation)BayesEoR,PolyChord(for Bayesian inference)Custom parameter sweep + perturbation frameworks (e.g. beam FWHM studies)
π§ Bayesian Inference and Model Validationο
Rather than relying on visual inspection or point estimates, we use Bayesian inference to:
Quantify parameter uncertainties
Compare competing models via evidence
Identify risk of bias due to model mis-specification (e.g. foreground leakage, calibration errors)
We extend this with the BaNTER framework, which allows us to validate models on signal-free data before using them for inference.
π» Our Infrastructureο
Validation requires reproducibility, scalability, and traceability. We use a modern, science-friendly stack:
Component |
Details |
|---|---|
Computing |
HPC access via Azimuth and Galahad (incl. CPU and GPU nodes for accelerated computation ) |
Versioning |
All validation code tracked in UKSRC GitHub repos (e.g. |
Workflow |
Notebooks for exploratory work, containers (Docker/Singularity) for reproducibility |
Agile Tracking |
Work organized via JIRA Epics and Stories (e.g. SKAO Jiraad75ab71-1245-3349-8713-12bcc32bca7cSAPP-146) |
π Iteration and Feedbackο
We design validation activities to be iterative:
Early-stage simulations help set science and calibration requirements
Mid-stage validations feed into model refinement
Final-stage validations provide confidence before release of science results
Feedback loops with instrument teams, sky model developers, and domain scientists are key to this approach.
π In Practiceβ¦ο
Each analysis we validate goes through a tailored process:
We define the signal of interest
Identify the relevant nuisance models (e.g. beam, FG, calibration)
Build a suite of simulations
Validate individual components and the pipeline as a whole
Apply Bayesian model comparison and BaNTER
Document performance, signal loss, and bias
This structured process ensures science credibility is not left to chance.
π§ 5. Key Conceptsο
Validation uses a specific vocabulary to describe how we assess the credibility of models, pipelines, and inferences. Here we define the most important concepts that underpin our methodology.
π― Signal of Interest (SOI)ο
Everything in the validation process is designed to ensure that our measurement of the SOI is accurate, unbiased, and robust.
π Composite Modelο
A model that describes multiple components in the data, such as:
the SOI
astrophysical foregrounds
instrumental effects
noise
Example:
A 21-cm power spectrum model that includes both foreground emission and beam effects.
Composite models are required for realism β but they can hide biases if components interact poorly or absorb each otherβs errors.
π Predictivity vs. Accuracyο
Term |
Definition |
|---|---|
Predictivity |
How well a model fits the data overall (i.e. total evidence) |
Accuracy |
Whether individual components of the model correctly describe their signals |
A model can be predictive but not physically accurate β fitting the data well but getting the SOI wrong due to unmodelled systematics.
βοΈ Bayesian Evidence and Bayes Factorsο
Value >1: model A preferred
Value <1: model B preferred
Bayes factors are used in Bayesian model comparison (BFBMC), but this alone doesnβt guarantee accurate SOI recovery (see next concept).
π« Failure of Bayes-Factor-Only Comparisonο
Bayesian model comparison can mislead if:
A model fits the data well in aggregate
But does so by misattributing signal between components
This is particularly dangerous when the SOI is sub-dominant (e.g. buried beneath foregrounds).
β BaNTER (Bayesian Null Test Evidence Ratio)ο
BaNTER is a validation framework designed to detect when a model is at risk of producing biased SOI estimates.
It works by:
Running a null test on simulated data that contains no SOI
Comparing the evidence of the nuisance-only model vs. the full composite model
If the composite model fits the SOI-free data better, it may be absorbing residual structure into the SOI β a red flag for bias
BaNTER provides a prior penalty against risky models before they are used for inference.
π§° Validated Posterior Oddsο
This combines:
Bayes factor (a posteriori evidence from real data)
BaNTER null test (a priori model credibility)
Result: a robust, bias-aware model selection criterion, improving confidence in the final inference.
Validation Workflowο
The diagram highlights three main stages β model specification, model validation, and data analysis β and distinguishes between a standard unvalidated Bayesian inference workflow (dashed red lines) and the enhanced BaNTER-validated approach (solid black lines). This formalises the process of introducing prior credibility assessments before posterior model comparison, improving robustness of signal inference. (Figure credit: Sims+2025)
These concepts form the intellectual backbone of our validation framework β combining physical realism, statistical rigour, and practical insight to support trustworthy SKA science.
π§° 6. Tools We Useο
Robust validation requires powerful tools that can model, simulate, and statistically analyse SKA-class data and systems. The UKSRC-ST team leverages a suite of open-source and in-house tools, tailored to the unique challenges of 21-cm cosmology and interferometric calibration.
π₯οΈ Simulation Toolsο
These generate realistic data for testing pipelines, evaluating sensitivity, and injecting known signals.
Tool |
Purpose |
|---|---|
|
High-accuracy visibility simulator based on the Measurement Equation |
OSKAR |
Full-featured radio telescope simulator with beam and ionosphere support |
fftvis |
High-performance visibility simulator using non-uniform FFT (NUFFT) |
These tools allow us to simulate full datasets with known ground truth, including beams, sky models, noise, and layout.
π Bayesian Inference Toolsο
These are used to fit models to simulated (or real) data, and evaluate the evidence for different scientific hypotheses.
Tool |
Purpose |
|---|---|
|
Bayesian inference framework for the 21-cm power spectrum |
|
Nested sampler used for evidence computation |
|
Framework for null-test-based model validation |
These tools enable us to perform:
Posterior estimation
Model comparison
Validation via posterior odds
π§ͺ Validation Frameworks and Perturbation Studiesο
Science recovery requires knowledge of the instrument. This knowledge is necessarily incomplete (e.g. element patterns and signal chain characterised to x% and y% accuracy, with x and y < 100). Thus, the question naturally arises: is our knowledge sufficient for precision cosmology? To answer this, we can test sensitivity to systematic uncertainties by perturbing model components:
- Beam FWHM perturbation framework(e.g.
ValSKA-HERA-beam-FWHMGitHub repo) Instrument layout variation
Foreground modelling variations
Sky model incompleteness tests
These are used to probe how errors in inputs propagate to science outputs β and what level of precision is required to keep them under control.
π§ Infrastructureο
Platform |
Purpose |
|---|---|
Azimuth |
UKSRC GPU-enabled local workstation (setup for validation) |
Galahad |
UoM HPC platform (batch + interactive modes) |
We use Singularity and Docker containers for reproducible software environments, including custom builds for BayesEoR, pyuvsim, and PolyChord.
ποΈ Version Control and Collaborationο
System |
Use |
|---|---|
GitHub |
Code repositories (e.g. simulations, notebooks, validation tooling) |
JIRA |
Epic and story tracking |
Confluence |
Documentation, meeting minutes, validation primer |
Slack |
Team and stakeholder communications |
Miro |
Visual planning, feature mapping |
This toolset is modular, extensible, and FAIR-aligned, allowing us to scale from tightly controlled validation cases to complex, high-dimensional SKA use cases.
π€ 7. How Stakeholders Benefitο
While the methods we use are technical, the value of validation is strategic. It ensures that science with the SKA and its precursors is credible, accurate, and actionable.
Hereβs how validation supports different stakeholder groups:
π¬ For Science Leads and Analystsο
- Confidence in ResultsKnow that reported detections or upper limits are statistically sound, not artefacts of systematics.
- Model Selection with RigorUse validated posterior odds to compare competing theories or astrophysical models.
- Targeted Model RefinementIdentify where modelling effort is most needed (e.g. better beams vs. better sky models).
π οΈ For Instrumentation and Calibration Teamsο
- Requirements DerivationQuantify how precisely components (e.g. beam FWHM, antenna layout) must be known to support specific science goals.
- Feedback LoopsUnderstand how instrument choices affect scientific accuracy, even before deployment.
πΌ For Programme Managers and Fundersο
- Risk ReductionValidation reduces the risk of costly reanalysis, spurious claims, or retraction of results.
- Investment PrioritisationEnables data-driven decisions on where to focus calibration, simulation, or modelling efforts.
- Impact TrackingProvides evidence of progress toward precision cosmology readiness β key for reporting and review cycles.
π£ Stakeholder Benefits Summaryο
Why This Matters to Stakeholders
Validation builds trust in science outputs before real data even arrives
It provides quantitative answers to questions like: βIs our beam model good enough?β or βWhatβs the risk of false detection?β
Helps prioritise which instrument parameters or modelling choices matter most
Makes UKSRC science contributions more credible and internationally competitive
β In short: Validation turns βwe hope this worksβ into βwe have evidence that it does.β
π 8. Looking Aheadο
Validation is not a one-off task β it is a continuous process that evolves alongside our instruments, software, and science goals. As we scale toward full SKA operations, the UKSRC-ST team will work on ensuring that validation keeps pace with the growing complexity and ambition of the science.
π Scaling to SKA-Class Complexityο
- Bigger SimulationsWe are expanding to simulate full SKA-class observations, with thousands of antennas and high-resolution sky models.
- More Science CasesAfter establishing pipelines for 21-cm cosmology, we aim to extend validation to other domains such as pulsar timing, continuum imaging, and intensity mapping.
- Cross-Validation Across ToolsOngoing efforts to compare outputs across multiple simulators (e.g.
pyuvsim,fftvis,OSKAR) will help identify hidden assumptions and strengthen robustness.
βοΈ Automation and Reproducibilityο
- FAIR-Aligned PipelinesWeβre building validation workflows that are Findable, Accessible, Interoperable, and Reproducible.
- CI/CD for ScienceMoving toward containerized, automated validation tests that run as part of our development cycle.
- Traceable DeliverablesVersioned validation reports will allow stakeholders to track performance over time.
π§ Deeper Integration with Inferenceο
- Validated Posterior OddsFuture analyses can directly incorporate model credibility (via BaNTER) into parameter estimation.
- Uncertainty-Aware Decision MakingHelping scientists and instrument teams understand not just what a result is, but how reliable it is β and why.
π₯ Strengthening Stakeholder Engagementο
- Transparent ReportingWeβll continue to publish validation results, failure modes, and assumptions clearly and accessibly.
- Collaborative Requirements SettingValidation outputs should inform calibration requirements, observing strategies, and data processing plans.
- Training and Onboarding SupportMaterials and tutorials will help new team members, scientists, and partners understand and contribute to validation efforts.
π The future of SKA science depends on getting the analysis right β and that starts with validation.
By investing in rigorous, scalable validation now, weβre laying the foundation for credible, world-leading science with SKA and its pathfinders.