Tool Implementer’s Guide
Version: 1.0.0-draft Status: Draft Last updated: 2026-01-21
1. Introduction
1.1 Purpose
This guide provides a practical walkthrough for implementing a new external tool integration in ValSKA. It complements the External Tool Integration Specification by offering step-by-step instructions, concrete examples, and implementation patterns.
1.2 Audience
This guide is intended for developers who:
Are adding a new external tool to ValSKA
Have read the External Tool Integration Specification
Are familiar with Python packaging and CLI development
Understand SLURM batch scheduling concepts
1.3 Running example
This guide uses pyuvsim as the running example — a visibility simulator that will be integrated as a single-stage, CPU-based tool. The patterns shown apply equally to other tools with different characteristics.
1.4 Prerequisites
Before starting, ensure you have:
A working ValSKA development environment
Access to the external tool (pyuvsim) and its documentation
Understanding of the tool’s configuration format and execution model
2. Planning your integration
2.1 Key questions to answer
Before writing code, answer these questions about your tool:
Question |
Example (pyuvsim) |
|---|---|
What stages does execution involve? |
Single stage: |
What resources does each stage require? |
CPU-bound, multi-core |
Is any stage resumable? |
No |
What configuration format does the tool use? |
YAML |
How is the tool invoked? |
|
What execution environments are supported? |
Conda, container |
What input data is required? |
Telescope config, sky model, observation parameters |
What outputs are produced? |
UVH5 visibility file |
2.2 Define your stages
Document your stages using the format from §8.2 of the specification:
pyuvsim stages:
Stage |
Purpose |
Dependencies |
Resumable |
Resource profile |
|---|---|---|---|---|
|
Generate simulated visibilities |
None |
No |
CPU, 16–64 cores, 32GB+ memory |
For single-stage tools, this table has one row — that’s fine. The stage abstraction still applies.
2.3 Define your taxonomy
The taxonomy determines the directory hierarchy under <results_root>/<tool>/. Choose a structure that:
Groups related runs logically
Supports your typical query patterns
Remains stable as usage evolves
Example taxonomies:
Tool |
Taxonomy |
Rationale |
|---|---|---|
BayesEoR |
|
Primary analysis dimensions |
pyuvsim |
|
Groups by instrument and observation |
Your taxonomy is encoded in how you construct the run directory path (see §4.4).
3. Module structure
3.1 Create the module directory
Create the tool module under src/valska_hera_beam/external_tools/:
mkdir -p src/valska_hera_beam/external_tools/pyuvsim/templates
touch src/valska_hera_beam/external_tools/pyuvsim/__init__.py
touch src/valska_hera_beam/external_tools/pyuvsim/cli_prepare.py
touch src/valska_hera_beam/external_tools/pyuvsim/cli_submit.py
touch src/valska_hera_beam/external_tools/pyuvsim/setup.py
touch src/valska_hera_beam/external_tools/pyuvsim/submit.py
touch src/valska_hera_beam/external_tools/pyuvsim/runner.py
touch src/valska_hera_beam/external_tools/pyuvsim/slurm.py
touch src/valska_hera_beam/external_tools/pyuvsim/templates.py
3.2 File responsibilities
File |
Responsibility |
|---|---|
|
Public API exports |
|
CLI entry point for |
|
CLI entry point for |
|
Core prepare logic (run directory creation, manifest writing) |
|
Core submit logic (SLURM interaction, jobs.json writing) |
|
Runner class definitions (CondaRunner, ContainerRunner) |
|
SLURM submit script generation |
|
Template discovery and loading utilities |
|
Shipped configuration templates |
3.3 Register entry points
Add CLI entry points to pyproject.toml:
[project.scripts]
# ...existing entries...
valska-pyuvsim-prepare = "valska_hera_beam.external_tools.pyuvsim.cli_prepare:main"
valska-pyuvsim-submit = "valska_hera_beam.external_tools.pyuvsim.cli_submit:main"
After adding these, reinstall the package in development mode:
pip install -e .
4. Implementing the prepare phase
4.1 Overview
The prepare phase:
Parses CLI arguments
Loads configuration from
runtime_paths.yamlResolves paths and merges overrides
Creates the run directory
Writes tool configuration files
Generates SLURM submit scripts
Writes
manifest.json
4.2 CLI argument design
Design arguments that are:
Specific — Named for what they control, not implementation details
Overridable — Allow CLI to override config file defaults
Documentable — Clear help text for each argument
Example argument structure for pyuvsim:
# filepath: src/valska_hera_beam/external_tools/pyuvsim/cli_prepare.py
def build_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(
prog="valska-pyuvsim-prepare",
description="Prepare a pyuvsim simulation run directory.",
)
# Required identifiers
parser.add_argument(
"--run-id",
required=True,
help="Unique identifier for this run.",
)
# Taxonomy components
parser.add_argument(
"--telescope",
required=True,
help="Telescope name (e.g., HERA, SKA-Low).",
)
parser.add_argument(
"--observation-id",
default="default",
help="Observation identifier for grouping runs.",
)
# Input specification
parser.add_argument(
"--config",
type=Path,
required=True,
help="Path to pyuvsim configuration YAML.",
)
parser.add_argument(
"--telescope-config",
type=Path,
help="Path to telescope configuration (overrides config file).",
)
parser.add_argument(
"--sky-model",
type=Path,
help="Path to sky model file (overrides config file).",
)
# Template-based configuration (alternative to --config)
parser.add_argument(
"--template",
help="Use a shipped template by name (e.g., 'hera_validation').",
)
# Path overrides
parser.add_argument(
"--results-root",
type=Path,
help="Override results root directory.",
)
# SLURM overrides
parser.add_argument(
"--partition",
help="SLURM partition override.",
)
parser.add_argument(
"--time",
help="SLURM time limit override.",
)
parser.add_argument(
"--cpus-per-task",
type=int,
help="SLURM CPUs per task override.",
)
parser.add_argument(
"--mem",
help="SLURM memory override.",
)
# Execution control
parser.add_argument(
"--dry-run",
action="store_true",
help="Preview actions without creating files.",
)
return parser
4.3 Loading configuration
Load and merge configuration following the hierarchy:
# filepath: src/valska_hera_beam/external_tools/pyuvsim/cli_prepare.py
from pathlib import Path
from typing import Any
import yaml
def load_runtime_config() -> dict[str, Any]:
"""Load runtime_paths.yaml configuration."""
config_path = Path(__file__).parents[4] / "config" / "runtime_paths.yaml"
if not config_path.exists():
return {}
with config_path.open() as f:
return yaml.safe_load(f) or {}
def get_tool_config(runtime: dict[str, Any]) -> dict[str, Any]:
"""Extract pyuvsim-specific configuration."""
return runtime.get("pyuvsim", {})
def resolve_results_root(
cli_override: Path | None,
runtime: dict[str, Any],
) -> Path:
"""Resolve results root with precedence: CLI > config > error."""
if cli_override is not None:
return cli_override
root = runtime.get("results_root")
if root is not None:
return Path(root)
raise ValueError(
"No results_root configured. "
"Set in runtime_paths.yaml or pass --results-root."
)
4.4 Building the run directory path
Construct the run directory following your taxonomy:
# filepath: src/valska_hera_beam/external_tools/pyuvsim/setup.py
from pathlib import Path
def build_run_dir(
results_root: Path,
telescope: str,
observation_id: str,
run_id: str,
unique: bool = False,
) -> Path:
"""
Construct the run directory path.
Structure: <results_root>/pyuvsim/<telescope>/<observation_id>/<run_id>/
"""
run_dir = results_root / "pyuvsim" / telescope / observation_id / run_id
if unique:
from datetime import datetime, timezone
timestamp = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
run_dir = run_dir / timestamp
return run_dir
4.5 Writing the manifest
Create manifest.json with all required fields:
# filepath: src/valska_hera_beam/external_tools/pyuvsim/setup.py
import json
from datetime import datetime, timezone
from pathlib import Path
from typing import Any
from valska_hera_beam import __version__
def write_manifest(
run_dir: Path,
*,
run_id: str,
results_root: Path,
telescope: str,
observation_id: str,
config_path: Path,
template_name: str | None,
slurm_config: dict[str, Any],
tool_config: dict[str, Any],
) -> Path:
"""Write manifest.json to run directory."""
manifest = {
# Required fields (per spec §6.1.3)
"tool": "pyuvsim",
"created_utc": datetime.now(timezone.utc).isoformat(),
"valska_version": __version__,
"run_id": run_id,
"run_dir": str(run_dir.resolve()),
"results_root": str(results_root.resolve()),
# Taxonomy
"telescope": telescope,
"observation_id": observation_id,
# Configuration sources
"config_yaml": str(config_path.resolve()),
"template_name": template_name,
# SLURM configuration by stage
"slurm": {
"simulate": slurm_config,
},
# Tool-specific section
"pyuvsim": tool_config,
}
manifest_path = run_dir / "manifest.json"
with manifest_path.open("w") as f:
json.dump(manifest, f, indent=2)
return manifest_path
4.6 Generating SLURM submit scripts
Create the submit script generator:
# filepath: src/valska_hera_beam/external_tools/pyuvsim/slurm.py
from pathlib import Path
from typing import Any
from .runner import CondaRunner, ContainerRunner
def render_submit_script(
*,
runner: CondaRunner | ContainerRunner,
config_yaml: Path,
run_dir: Path,
slurm: dict[str, Any],
) -> str:
"""
Render a SLURM submit script for pyuvsim.
Parameters
----------
runner
Execution environment (conda or container).
config_yaml
Path to pyuvsim configuration file.
run_dir
Run directory for outputs and logs.
slurm
SLURM directives. Keys set to None are omitted.
"""
lines = ["#!/bin/bash"]
# Job name
job_name = slurm.get("job_name") or f"pyuvsim_{run_dir.name}"
lines.append(f"#SBATCH --job-name={job_name}")
# Output logs
output = slurm.get("output") or run_dir / "slurm-%j.out"
lines.append(f"#SBATCH --output={output}")
# Resource directives (only emit if not None)
directive_map = {
"partition": "--partition",
"constraint": "--constraint",
"qos": "--qos",
"account": "--account",
"time": "--time",
"mem": "--mem",
"nodes": "--nodes",
"ntasks": "--ntasks",
"cpus_per_task": "--cpus-per-task",
}
for key, directive in directive_map.items():
value = slurm.get(key)
if value is not None:
lines.append(f"#SBATCH {directive}={value}")
# Extra directives
for extra in slurm.get("extra_sbatch", []):
lines.append(f"#SBATCH {extra}")
lines.append("")
lines.append("set -euo pipefail")
lines.append("")
# Diagnostic output
lines.append("echo '=== Job Information ==='")
lines.append("echo \"Job ID: $SLURM_JOB_ID\"")
lines.append("echo \"Node: $(hostname)\"")
lines.append(f"echo \"Run directory: {run_dir}\"")
lines.append(f"echo \"Config: {config_yaml}\"")
lines.append("echo \"Start time: $(date -u +'%Y-%m-%dT%H:%M:%SZ')\"")
lines.append("echo")
lines.append("")
# Environment setup
if isinstance(runner, CondaRunner):
lines.append("# Activate conda environment")
lines.append(runner.conda_sh)
lines.append(f"conda activate {runner.conda_env}")
lines.append("")
# Execution command
lines.append("# Run pyuvsim")
lines.append(f"cd {run_dir}")
lines.append(f"python -m pyuvsim.uvsim {config_yaml}")
elif isinstance(runner, ContainerRunner):
lines.append("# Run via container")
bind_opts = f"--bind {runner.container_bind}" if runner.container_bind else ""
lines.append(
f"apptainer exec {bind_opts} {runner.container_image} "
f"python -m pyuvsim.uvsim {config_yaml}"
)
lines.append("")
lines.append("echo")
lines.append("echo \"End time: $(date -u +'%Y-%m-%dT%H:%M:%SZ')\"")
lines.append("echo '=== Job Complete ==='")
return "\n".join(lines)
4.7 Runner definitions
Define your runner classes:
# filepath: src/valska_hera_beam/external_tools/pyuvsim/runner.py
from dataclasses import dataclass
from pathlib import Path
@dataclass(frozen=True)
class CondaRunner:
"""Execute pyuvsim within a conda environment."""
conda_sh: str
"""Command to source conda (e.g., 'source /path/to/conda.sh')."""
conda_env: str
"""Name of the conda environment."""
@dataclass(frozen=True)
class ContainerRunner:
"""Execute pyuvsim within an Apptainer/Singularity container."""
container_image: Path
"""Path to the container image (.sif file)."""
container_bind: str | None = None
"""Bind mount specification (e.g., '/data:/data,/scratch:/scratch')."""
4.8 Main prepare function
Bring it all together:
# filepath: src/valska_hera_beam/external_tools/pyuvsim/setup.py
from pathlib import Path
from typing import Any
import shutil
from .runner import CondaRunner, ContainerRunner
from .slurm import render_submit_script
def prepare_pyuvsim_run(
*,
run_id: str,
results_root: Path,
telescope: str,
observation_id: str,
config_path: Path,
runner: CondaRunner | ContainerRunner,
slurm_config: dict[str, Any],
template_name: str | None = None,
tool_config: dict[str, Any] | None = None,
dry_run: bool = False,
) -> dict[str, Any]:
"""
Prepare a pyuvsim run directory.
Returns a dict with:
- run_dir: Path to created run directory
- manifest: Path to manifest.json
- submit_script: Path to submit script
"""
# Build run directory path
run_dir = build_run_dir(
results_root=results_root,
telescope=telescope,
observation_id=observation_id,
run_id=run_id,
)
if dry_run:
return {
"run_dir": run_dir,
"dry_run": True,
"would_create": [
run_dir / "manifest.json",
run_dir / "config.yaml",
run_dir / "submit_simulate.sh",
],
}
# Create run directory
run_dir.mkdir(parents=True, exist_ok=True)
# Copy configuration to run directory
run_config = run_dir / "config.yaml"
shutil.copy(config_path, run_config)
# Generate submit script
submit_script_content = render_submit_script(
runner=runner,
config_yaml=run_config,
run_dir=run_dir,
slurm=slurm_config,
)
submit_script = run_dir / "submit_simulate.sh"
submit_script.write_text(submit_script_content)
submit_script.chmod(0o755)
# Write manifest
manifest_path = write_manifest(
run_dir=run_dir,
run_id=run_id,
results_root=results_root,
telescope=telescope,
observation_id=observation_id,
config_path=config_path,
template_name=template_name,
slurm_config=slurm_config,
tool_config=tool_config or {},
)
return {
"run_dir": run_dir,
"manifest": manifest_path,
"submit_script": submit_script,
"config": run_config,
}
4.9 CLI main function
Wire up the CLI:
# filepath: src/valska_hera_beam/external_tools/pyuvsim/cli_prepare.py
#!/usr/bin/env python3
"""CLI entry point for valska-pyuvsim-prepare."""
from __future__ import annotations
import argparse
import sys
from pathlib import Path
from typing import Any
from .runner import CondaRunner, ContainerRunner
from .setup import prepare_pyuvsim_run
def build_parser() -> argparse.ArgumentParser:
# ... (as shown in §4.2)
pass
def load_runtime_config() -> dict[str, Any]:
# ... (as shown in §4.3)
pass
def main(argv: list[str] | None = None) -> int:
"""Main entry point."""
parser = build_parser()
args = parser.parse_args(argv)
try:
# Load configuration
runtime = load_runtime_config()
tool_config = runtime.get("pyuvsim", {})
# Resolve paths
results_root = resolve_results_root(args.results_root, runtime)
# Build runner
runner = CondaRunner(
conda_sh=tool_config.get("conda_sh", "source ~/conda.sh"),
conda_env=tool_config.get("conda_env", "pyuvsim"),
)
# Build SLURM config with overrides
slurm_config = build_slurm_config(tool_config, args)
# Resolve config path
if args.template:
config_path = get_template_path(args.template)
elif args.config:
config_path = args.config
else:
print("ERROR: Must specify --config or --template", file=sys.stderr)
return 2
# Prepare the run
result = prepare_pyuvsim_run(
run_id=args.run_id,
results_root=results_root,
telescope=args.telescope,
observation_id=args.observation_id,
config_path=config_path,
runner=runner,
slurm_config=slurm_config,
template_name=args.template,
dry_run=args.dry_run,
)
# Output
if args.dry_run:
print("[DRY RUN] Would create:")
for path in result["would_create"]:
print(f" {path}")
print(f"\nRun directory: {result['run_dir']}")
else:
print(f"Run directory: {result['run_dir']}")
print(f"Manifest: {result['manifest']}")
print(f"Submit script: {result['submit_script']}")
print()
print("Next steps:")
print(f" valska-pyuvsim-submit {result['run_dir']}")
return 0
except ValueError as e:
print(f"ERROR: {e}", file=sys.stderr)
return 2
except Exception as e:
print(f"ERROR: {e}", file=sys.stderr)
return 1
if __name__ == "__main__":
sys.exit(main())
5. Implementing the submit phase
5.1 Overview
The submit phase:
Validates the run directory exists and is prepared
Reads the manifest
Submits SLURM jobs
Records job IDs in
jobs.json
5.2 Core submit logic
# filepath: src/valska_hera_beam/external_tools/pyuvsim/submit.py
from __future__ import annotations
import json
import re
import subprocess
from dataclasses import dataclass
from datetime import datetime, timezone
from pathlib import Path
from typing import Any
class SubmissionError(RuntimeError):
"""Raised when SLURM submission fails."""
pass
_JOBID_RE = re.compile(r"Submitted\s+batch\s+job\s+(\d+)", re.IGNORECASE)
def _utc_now_iso() -> str:
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
def load_manifest(run_dir: Path) -> dict[str, Any]:
"""Load and validate manifest.json."""
manifest_path = run_dir / "manifest.json"
if not manifest_path.exists():
raise SubmissionError(
f"Manifest not found: {manifest_path}\n"
f"Hint: Run 'valska-pyuvsim-prepare' first."
)
with manifest_path.open() as f:
return json.load(f)
def submit_pyuvsim_run(
run_dir: Path,
*,
stage: str = "all",
dry_run: bool = False,
resubmit: bool = False,
sbatch: str = "sbatch",
) -> dict[str, Any]:
"""
Submit a prepared pyuvsim run to SLURM.
Parameters
----------
run_dir
Path to prepared run directory.
stage
Stage to submit ('simulate' or 'all').
dry_run
If True, print commands without executing.
resubmit
If True, archive existing jobs.json before submitting.
sbatch
Path to sbatch executable.
Returns
-------
dict with job submission details.
"""
run_dir = Path(run_dir).resolve()
# Validate run directory
if not run_dir.exists():
raise SubmissionError(f"Run directory not found: {run_dir}")
manifest = load_manifest(run_dir)
# Handle resubmission
jobs_path = run_dir / "jobs.json"
if jobs_path.exists() and not resubmit:
raise SubmissionError(
f"jobs.json already exists: {jobs_path}\n"
f"Use --resubmit to archive and resubmit."
)
if jobs_path.exists() and resubmit:
archive_path = run_dir / f"jobs_{_utc_now_iso().replace(':', '')}.json"
jobs_path.rename(archive_path)
# Find submit script
submit_script = run_dir / "submit_simulate.sh"
if not submit_script.exists():
raise SubmissionError(f"Submit script not found: {submit_script}")
# Build submission command
cmd = [sbatch, str(submit_script)]
if dry_run:
print(f"[DRY RUN] Would execute: {' '.join(cmd)}")
job_id = None
else:
result = subprocess.run(
cmd,
capture_output=True,
text=True,
cwd=run_dir,
)
if result.returncode != 0:
raise SubmissionError(
f"sbatch failed with code {result.returncode}:\n{result.stderr}"
)
match = _JOBID_RE.search(result.stdout)
if not match:
raise SubmissionError(
f"Could not parse job ID from sbatch output:\n{result.stdout}"
)
job_id = match.group(1)
print(f"Submitted job {job_id}")
# Write jobs.json
jobs_record = {
"schema_version": "1.0",
"run_dir": str(run_dir),
"manifest": str(run_dir / "manifest.json"),
"sbatch": sbatch,
"dry_run": dry_run,
"submitted_utc": _utc_now_iso(),
"stage": stage,
"jobs": {
"simulate": {
"job_id": job_id,
"script": str(submit_script),
"dependency": None,
},
},
"commands": [" ".join(cmd)],
}
if not dry_run:
with jobs_path.open("w") as f:
json.dump(jobs_record, f, indent=2)
return jobs_record
5.3 Submit CLI
# filepath: src/valska_hera_beam/external_tools/pyuvsim/cli_submit.py
#!/usr/bin/env python3
"""CLI entry point for valska-pyuvsim-submit."""
from __future__ import annotations
import argparse
import sys
from pathlib import Path
from .submit import SubmissionError, submit_pyuvsim_run
def build_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(
prog="valska-pyuvsim-submit",
description="Submit a prepared pyuvsim run to SLURM.",
)
parser.add_argument(
"run_dir",
type=Path,
help="Path to prepared run directory.",
)
parser.add_argument(
"--stage",
default="all",
choices=["simulate", "all"],
help="Stage to submit (default: all).",
)
parser.add_argument(
"--dry-run",
action="store_true",
help="Print commands without executing.",
)
parser.add_argument(
"--resubmit",
action="store_true",
help="Archive existing jobs.json and resubmit.",
)
parser.add_argument(
"--sbatch",
default="sbatch",
help="Path to sbatch executable.",
)
return parser
def main(argv: list[str] | None = None) -> int:
"""Main entry point."""
parser = build_parser()
args = parser.parse_args(argv)
try:
result = submit_pyuvsim_run(
run_dir=args.run_dir,
stage=args.stage,
dry_run=args.dry_run,
resubmit=args.resubmit,
sbatch=args.sbatch,
)
if args.dry_run:
print("\n[DRY RUN] No jobs submitted.")
else:
print(f"\njobs.json written to: {args.run_dir / 'jobs.json'}")
return 0
except SubmissionError as e:
print(f"ERROR: {e}", file=sys.stderr)
return 3 if "not found" in str(e).lower() else 4
except Exception as e:
print(f"ERROR: {e}", file=sys.stderr)
return 1
if __name__ == "__main__":
sys.exit(main())
6. Configuration in runtime_paths.yaml
6.1 Tool section structure
Add a section for your tool in config/runtime_paths.yaml:
# filepath: config/runtime_paths.yaml
# ... existing content ...
pyuvsim:
# Conda settings
conda_sh: "source /home/user/miniforge3/etc/profile.d/conda.sh"
conda_env: pyuvsim
# Container settings (alternative to conda)
# container_image: /path/to/pyuvsim.sif
# container_bind: "/data:/data,/scratch:/scratch"
# Default template
default_template: hera_validation.yaml
# SLURM defaults for simulate stage
slurm_defaults:
partition: cpu
time: "04:00:00"
mem: "32G"
cpus_per_task: 16
nodes: 1
ntasks: 1
6.2 Documenting hardcoded defaults
In your module, document which values have hardcoded defaults:
# filepath: src/valska_hera_beam/external_tools/pyuvsim/cli_prepare.py
def build_slurm_config(
tool_config: dict[str, Any],
args: argparse.Namespace,
) -> dict[str, Any]:
"""
Build SLURM configuration with defaults and overrides.
Hardcoded defaults (overridden by runtime_paths.yaml, then CLI):
- time: "04:00:00"
- mem: "16G"
- cpus_per_task: 8
- nodes: 1
- ntasks: 1
- job_name_prefix: "pyuvsim"
"""
# Start with hardcoded defaults
config = {
"time": "04:00:00",
"mem": "16G",
"cpus_per_task": 8,
"nodes": 1,
"ntasks": 1,
"job_name_prefix": "pyuvsim",
}
# Merge tool defaults from runtime_paths.yaml
slurm_defaults = tool_config.get("slurm_defaults", {})
config.update(slurm_defaults)
# Apply CLI overrides
if args.partition:
config["partition"] = args.partition
if args.time:
config["time"] = args.time
if args.cpus_per_task:
config["cpus_per_task"] = args.cpus_per_task
if args.mem:
config["mem"] = args.mem
return config
7. Templates
7.1 Creating templates
Place configuration templates in the templates/ subdirectory:
# filepath: src/valska_hera_beam/external_tools/pyuvsim/templates/hera_validation.yaml
# HERA validation simulation template
# Values marked __SET_BY_VALSKA__ are replaced at prepare time
# Telescope configuration
telescope:
name: HERA
config_path: "__SET_BY_VALSKA__"
# Observation parameters
observation:
start_time: 2458115.5
duration_hours: 4.0
integration_time: 10.0
# Frequency settings
frequency:
start_mhz: 150.0
end_mhz: 160.0
n_channels: 128
# Sky model
sky_model:
path: "__SET_BY_VALSKA__"
type: "gleam"
# Output
output:
path: "__SET_BY_VALSKA__"
format: "uvh5"
7.2 Template utilities
# filepath: src/valska_hera_beam/external_tools/pyuvsim/templates.py
from pathlib import Path
def _templates_dir() -> Path:
"""Return the templates directory path."""
return Path(__file__).parent / "templates"
def list_templates() -> list[str]:
"""List available template names."""
templates_dir = _templates_dir()
if not templates_dir.exists():
return []
return sorted(p.name for p in templates_dir.glob("*.yaml"))
def get_template_path(name: str) -> Path:
"""
Get the full path to a template by name.
Parameters
----------
name
Template name (with or without .yaml extension).
Returns
-------
Path to the template file.
Raises
------
FileNotFoundError
If template does not exist.
"""
if not name.endswith(".yaml"):
name = f"{name}.yaml"
path = _templates_dir() / name
if not path.exists():
available = list_templates()
raise FileNotFoundError(
f"Template not found: {name}\n"
f"Available templates: {', '.join(available) or '(none)'}"
)
return path
8. Public API
8.1 Module exports
# filepath: src/valska_hera_beam/external_tools/pyuvsim/__init__.py
"""
pyuvsim integration for ValSKA.
Primary entry points:
- prepare_pyuvsim_run: Prepare a simulation run directory.
- submit_pyuvsim_run: Submit a prepared run to SLURM.
- get_template_path: Access shipped simulation templates.
"""
from .runner import CondaRunner, ContainerRunner
from .setup import prepare_pyuvsim_run
from .submit import SubmissionError, submit_pyuvsim_run
from .templates import get_template_path, list_templates
__all__ = [
"prepare_pyuvsim_run",
"submit_pyuvsim_run",
"SubmissionError",
"get_template_path",
"list_templates",
"CondaRunner",
"ContainerRunner",
]
9. Testing
9.1 Required tests
Create tests in tests/external_tools/pyuvsim/:
# filepath: tests/external_tools/pyuvsim/test_prepare.py
import pytest
from pathlib import Path
from valska_hera_beam.external_tools.pyuvsim import (
prepare_pyuvsim_run,
CondaRunner,
)
class TestPrepareDryRun:
"""Test prepare with --dry-run."""
def test_dry_run_creates_no_files(self, tmp_path):
"""Verify dry run does not create any files."""
results_root = tmp_path / "results"
result = prepare_pyuvsim_run(
run_id="test_001",
results_root=results_root,
telescope="HERA",
observation_id="test",
config_path=Path("dummy.yaml"),
runner=CondaRunner("source conda.sh", "pyuvsim"),
slurm_config={},
dry_run=True,
)
assert result["dry_run"] is True
assert not results_root.exists()
class TestPrepareExecution:
"""Test actual prepare execution."""
def test_creates_run_directory(self, tmp_path):
"""Verify run directory is created with expected contents."""
results_root = tmp_path / "results"
config_file = tmp_path / "config.yaml"
config_file.write_text("test: config")
result = prepare_pyuvsim_run(
run_id="test_001",
results_root=results_root,
telescope="HERA",
observation_id="obs1",
config_path=config_file,
runner=CondaRunner("source conda.sh", "pyuvsim"),
slurm_config={"time": "01:00:00"},
dry_run=False,
)
run_dir = result["run_dir"]
assert run_dir.exists()
assert (run_dir / "manifest.json").exists()
assert (run_dir / "submit_simulate.sh").exists()
assert (run_dir / "config.yaml").exists()
def test_manifest_has_required_fields(self, tmp_path):
"""Verify manifest contains all required fields."""
import json
results_root = tmp_path / "results"
config_file = tmp_path / "config.yaml"
config_file.write_text("test: config")
result = prepare_pyuvsim_run(
run_id="test_001",
results_root=results_root,
telescope="HERA",
observation_id="obs1",
config_path=config_file,
runner=CondaRunner("source conda.sh", "pyuvsim"),
slurm_config={},
dry_run=False,
)
manifest = json.loads(result["manifest"].read_text())
# Required fields per spec §6.1.3
assert manifest["tool"] == "pyuvsim"
assert "created_utc" in manifest
assert "valska_version" in manifest
assert manifest["run_id"] == "test_001"
assert "run_dir" in manifest
assert "results_root" in manifest
9.2 Test configuration resolution
# filepath: tests/external_tools/pyuvsim/test_config.py
import pytest
from valska_hera_beam.external_tools.pyuvsim.cli_prepare import (
build_slurm_config,
)
class TestConfigResolution:
"""Test configuration hierarchy."""
def test_cli_overrides_config(self):
"""CLI arguments take precedence over config file."""
tool_config = {
"slurm_defaults": {
"partition": "default",
"time": "04:00:00",
}
}
# Simulate CLI args
class Args:
partition = "override"
time = None
cpus_per_task = None
mem = None
config = build_slurm_config(tool_config, Args())
assert config["partition"] == "override"
assert config["time"] == "04:00:00" # from config
def test_config_overrides_defaults(self):
"""Config file overrides hardcoded defaults."""
tool_config = {
"slurm_defaults": {
"time": "12:00:00", # override default 04:00:00
}
}
class Args:
partition = None
time = None
cpus_per_task = None
mem = None
config = build_slurm_config(tool_config, Args())
assert config["time"] == "12:00:00"
10. Checklist
Before considering your integration complete, verify:
10.1 Structure
[ ] Module created under
src/valska_hera_beam/external_tools/<tool>/[ ] Entry points registered in
pyproject.toml[ ]
__init__.pyexports public API
10.2 Prepare phase
[ ] CLI parses required and optional arguments
[ ]
--dry-runpreviews without creating files[ ]
--results-rootoverrides configured default[ ] Run directory created with correct structure
[ ]
manifest.jsonincludes all required fields[ ] SLURM submit script generated for each stage
[ ] Exit codes follow specification (0, 1, 2)
10.3 Submit phase
[ ] CLI accepts run directory as primary argument
[ ] Validates run directory exists and is prepared
[ ]
--dry-runpreviews without submitting[ ]
--stageselects specific stages[ ]
--resubmitarchives existing jobs.json[ ]
jobs.jsonwritten with all required fields[ ] Exit codes follow specification (0, 1, 3, 4)
10.4 Configuration
[ ] Tool section documented in
runtime_paths.yaml[ ] Hardcoded defaults documented in code
[ ] Configuration hierarchy respected (CLI > config > defaults)
[ ] SLURM directives nullable (None omits directive)
10.5 Testing
[ ] Prepare dry-run test
[ ] Prepare execution test
[ ] Manifest schema test
[ ] Submit dry-run test
[ ] Configuration resolution tests
Appendix A: Common patterns
A.1 Path resolution helper
def resolve_path(
cli_value: Path | None,
config_value: str | None,
default: Path | None = None,
name: str = "path",
) -> Path:
"""Resolve a path with precedence: CLI > config > default."""
if cli_value is not None:
return cli_value
if config_value is not None:
return Path(config_value)
if default is not None:
return default
raise ValueError(f"No {name} configured.")
A.2 UTC timestamp helper
from datetime import datetime, timezone
def utc_now_iso() -> str:
"""Return current UTC time in ISO 8601 format."""
return datetime.now(timezone.utc).isoformat()
def utc_now_compact() -> str:
"""Return current UTC time in compact format (for filenames)."""
return datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
A.3 Safe JSON writing
import json
from pathlib import Path
def write_json_atomic(path: Path, data: dict) -> None:
"""Write JSON atomically to avoid partial writes."""
tmp_path = path.with_suffix(".json.tmp")
with tmp_path.open("w") as f:
json.dump(data, f, indent=2)
tmp_path.rename(path)
Appendix B: Troubleshooting
B.1 Entry points not found after installation
Symptom: valska-pyuvsim-prepare: command not found
Solution:
pip install -e .
hash -r # Clear shell command cache
B.2 Import errors
Symptom: ModuleNotFoundError: No module named 'valska_hera_beam.external_tools.pyuvsim'
Check:
__init__.pyexists in the module directoryPackage reinstalled after adding module
B.3 Manifest missing required fields
Symptom: Tests fail on manifest validation
Check: Ensure tool field is included:
manifest = {
"tool": "pyuvsim",
# ...
}