# Tool Implementer's Guide **Version:** 1.0.0-draft **Status:** Draft **Last updated:** 2026-01-21 --- ## 1. Introduction ### 1.1 Purpose This guide provides a practical walkthrough for implementing a new external tool integration in ValSKA. It complements the [External Tool Integration Specification](external_tool_integration_spec.md) by offering step-by-step instructions, concrete examples, and implementation patterns. ### 1.2 Audience This guide is intended for developers who: - Are adding a new external tool to ValSKA - Have read the External Tool Integration Specification - Are familiar with Python packaging and CLI development - Understand SLURM batch scheduling concepts ### 1.3 Running example This guide uses **pyuvsim** as the running example — a visibility simulator that will be integrated as a single-stage, CPU-based tool. The patterns shown apply equally to other tools with different characteristics. ### 1.4 Prerequisites Before starting, ensure you have: - A working ValSKA development environment - Access to the external tool (pyuvsim) and its documentation - Understanding of the tool's configuration format and execution model --- ## 2. Planning your integration ### 2.1 Key questions to answer Before writing code, answer these questions about your tool: | Question | Example (pyuvsim) | |----------|-------------------| | What stages does execution involve? | Single stage: `simulate` | | What resources does each stage require? | CPU-bound, multi-core | | Is any stage resumable? | No | | What configuration format does the tool use? | YAML | | How is the tool invoked? | `python -m pyuvsim.uvsim ` | | What execution environments are supported? | Conda, container | | What input data is required? | Telescope config, sky model, observation parameters | | What outputs are produced? | UVH5 visibility file | ### 2.2 Define your stages Document your stages using the format from §8.2 of the specification: **pyuvsim stages:** | Stage | Purpose | Dependencies | Resumable | Resource profile | |-------|---------|--------------|-----------|------------------| | `simulate` | Generate simulated visibilities | None | No | CPU, 16–64 cores, 32GB+ memory | For single-stage tools, this table has one row — that's fine. The stage abstraction still applies. ### 2.3 Define your taxonomy The taxonomy determines the directory hierarchy under `//`. Choose a structure that: - Groups related runs logically - Supports your typical query patterns - Remains stable as usage evolves **Example taxonomies:** | Tool | Taxonomy | Rationale | |------|----------|-----------| | BayesEoR | `/` | Primary analysis dimensions | | pyuvsim | `/` | Groups by instrument and observation | Your taxonomy is encoded in how you construct the run directory path (see §4.4). --- ## 3. Module structure ### 3.1 Create the module directory Create the tool module under `src/valska/external_tools/`: ```bash mkdir -p src/valska/external_tools/pyuvsim/templates touch src/valska/external_tools/pyuvsim/__init__.py touch src/valska/external_tools/pyuvsim/cli_prepare.py touch src/valska/external_tools/pyuvsim/cli_submit.py touch src/valska/external_tools/pyuvsim/setup.py touch src/valska/external_tools/pyuvsim/submit.py touch src/valska/external_tools/pyuvsim/runner.py touch src/valska/external_tools/pyuvsim/slurm.py touch src/valska/external_tools/pyuvsim/templates.py ``` ### 3.2 File responsibilities | File | Responsibility | |------|----------------| | `__init__.py` | Public API exports | | `cli_prepare.py` | CLI entry point for `valska-pyuvsim-prepare` | | `cli_submit.py` | CLI entry point for `valska-pyuvsim-submit` | | `setup.py` | Core prepare logic (run directory creation, manifest writing) | | `submit.py` | Core submit logic (SLURM interaction, jobs.json writing) | | `runner.py` | Runner class definitions (CondaRunner, ContainerRunner) | | `slurm.py` | SLURM submit script generation | | `templates.py` | Template discovery and loading utilities | | `templates/` | Shipped configuration templates | ### 3.3 Register entry points Add CLI entry points to `pyproject.toml`: ```toml [project.scripts] # ...existing entries... valska-pyuvsim-prepare = "valska.external_tools.pyuvsim.cli_prepare:main" valska-pyuvsim-submit = "valska.external_tools.pyuvsim.cli_submit:main" ``` After adding these, reinstall the package in development mode: ```bash pip install -e . ``` --- ## 4. Implementing the prepare phase ### 4.1 Overview The prepare phase: 1. Parses CLI arguments 2. Loads configuration from `runtime_paths.yaml` 3. Resolves paths and merges overrides 4. Creates the run directory 5. Writes tool configuration files 6. Generates SLURM submit scripts 7. Writes `manifest.json` ### 4.2 CLI argument design Design arguments that are: - **Specific** — Named for what they control, not implementation details - **Overridable** — Allow CLI to override config file defaults - **Documentable** — Clear help text for each argument **Example argument structure for pyuvsim:** ```python # filepath: src/valska/external_tools/pyuvsim/cli_prepare.py def build_parser() -> argparse.ArgumentParser: parser = argparse.ArgumentParser( prog="valska-pyuvsim-prepare", description="Prepare a pyuvsim simulation run directory.", ) # Required identifiers parser.add_argument( "--run-id", required=True, help="Unique identifier for this run.", ) # Taxonomy components parser.add_argument( "--telescope", required=True, help="Telescope name (e.g., HERA, SKA-Low).", ) parser.add_argument( "--observation-id", default="default", help="Observation identifier for grouping runs.", ) # Input specification parser.add_argument( "--config", type=Path, required=True, help="Path to pyuvsim configuration YAML.", ) parser.add_argument( "--telescope-config", type=Path, help="Path to telescope configuration (overrides config file).", ) parser.add_argument( "--sky-model", type=Path, help="Path to sky model file (overrides config file).", ) # Template-based configuration (alternative to --config) parser.add_argument( "--template", help="Use a shipped template by name (e.g., 'hera_validation').", ) # Path overrides parser.add_argument( "--results-root", type=Path, help="Override results root directory.", ) # SLURM overrides parser.add_argument( "--partition", help="SLURM partition override.", ) parser.add_argument( "--time", help="SLURM time limit override.", ) parser.add_argument( "--cpus-per-task", type=int, help="SLURM CPUs per task override.", ) parser.add_argument( "--mem", help="SLURM memory override.", ) # Execution control parser.add_argument( "--dry-run", action="store_true", help="Preview actions without creating files.", ) return parser ``` ### 4.3 Loading configuration Load and merge configuration following the hierarchy: ```python # filepath: src/valska/external_tools/pyuvsim/cli_prepare.py from pathlib import Path from typing import Any import yaml def load_runtime_config() -> dict[str, Any]: """Load runtime_paths.yaml configuration.""" config_path = Path(__file__).parents[4] / "config" / "runtime_paths.yaml" if not config_path.exists(): return {} with config_path.open() as f: return yaml.safe_load(f) or {} def get_tool_config(runtime: dict[str, Any]) -> dict[str, Any]: """Extract pyuvsim-specific configuration.""" return runtime.get("pyuvsim", {}) def resolve_results_root( cli_override: Path | None, runtime: dict[str, Any], ) -> Path: """Resolve results root with precedence: CLI > config > error.""" if cli_override is not None: return cli_override root = runtime.get("results_root") if root is not None: return Path(root) raise ValueError( "No results_root configured. " "Set in runtime_paths.yaml or pass --results-root." ) ``` ### 4.4 Building the run directory path Construct the run directory following your taxonomy: ```python # filepath: src/valska/external_tools/pyuvsim/setup.py from pathlib import Path def build_run_dir( results_root: Path, telescope: str, observation_id: str, run_id: str, unique: bool = False, ) -> Path: """ Construct the run directory path. Structure: /pyuvsim//// """ run_dir = results_root / "pyuvsim" / telescope / observation_id / run_id if unique: from datetime import datetime, timezone timestamp = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ") run_dir = run_dir / timestamp return run_dir ``` ### 4.5 Writing the manifest Create `manifest.json` with all required fields: ```python # filepath: src/valska/external_tools/pyuvsim/setup.py import json from datetime import datetime, timezone from pathlib import Path from typing import Any from valska import __version__ def write_manifest( run_dir: Path, *, run_id: str, results_root: Path, telescope: str, observation_id: str, config_path: Path, template_name: str | None, slurm_config: dict[str, Any], tool_config: dict[str, Any], ) -> Path: """Write manifest.json to run directory.""" manifest = { # Required fields (per spec §6.1.3) "tool": "pyuvsim", "created_utc": datetime.now(timezone.utc).isoformat(), "valska_version": __version__, "run_id": run_id, "run_dir": str(run_dir.resolve()), "results_root": str(results_root.resolve()), # Taxonomy "telescope": telescope, "observation_id": observation_id, # Configuration sources "config_yaml": str(config_path.resolve()), "template_name": template_name, # SLURM configuration by stage "slurm": { "simulate": slurm_config, }, # Tool-specific section "pyuvsim": tool_config, } manifest_path = run_dir / "manifest.json" with manifest_path.open("w") as f: json.dump(manifest, f, indent=2) return manifest_path ``` ### 4.6 Generating SLURM submit scripts Create the submit script generator: ```python # filepath: src/valska/external_tools/pyuvsim/slurm.py from pathlib import Path from typing import Any from .runner import CondaRunner, ContainerRunner def render_submit_script( *, runner: CondaRunner | ContainerRunner, config_yaml: Path, run_dir: Path, slurm: dict[str, Any], ) -> str: """ Render a SLURM submit script for pyuvsim. Parameters ---------- runner Execution environment (conda or container). config_yaml Path to pyuvsim configuration file. run_dir Run directory for outputs and logs. slurm SLURM directives. Keys set to None are omitted. """ lines = ["#!/bin/bash"] # Job name job_name = slurm.get("job_name") or f"pyuvsim_{run_dir.name}" lines.append(f"#SBATCH --job-name={job_name}") # Output logs output = slurm.get("output") or run_dir / "slurm-%j.out" lines.append(f"#SBATCH --output={output}") # Resource directives (only emit if not None) directive_map = { "partition": "--partition", "constraint": "--constraint", "qos": "--qos", "account": "--account", "time": "--time", "mem": "--mem", "nodes": "--nodes", "ntasks": "--ntasks", "cpus_per_task": "--cpus-per-task", } for key, directive in directive_map.items(): value = slurm.get(key) if value is not None: lines.append(f"#SBATCH {directive}={value}") # Extra directives for extra in slurm.get("extra_sbatch", []): lines.append(f"#SBATCH {extra}") lines.append("") lines.append("set -euo pipefail") lines.append("") # Diagnostic output lines.append("echo '=== Job Information ==='") lines.append("echo \"Job ID: $SLURM_JOB_ID\"") lines.append("echo \"Node: $(hostname)\"") lines.append(f"echo \"Run directory: {run_dir}\"") lines.append(f"echo \"Config: {config_yaml}\"") lines.append("echo \"Start time: $(date -u +'%Y-%m-%dT%H:%M:%SZ')\"") lines.append("echo") lines.append("") # Environment setup if isinstance(runner, CondaRunner): lines.append("# Activate conda environment") lines.append(runner.conda_sh) lines.append(f"conda activate {runner.conda_env}") lines.append("") # Execution command lines.append("# Run pyuvsim") lines.append(f"cd {run_dir}") lines.append(f"python -m pyuvsim.uvsim {config_yaml}") elif isinstance(runner, ContainerRunner): lines.append("# Run via container") bind_opts = f"--bind {runner.container_bind}" if runner.container_bind else "" lines.append( f"apptainer exec {bind_opts} {runner.container_image} " f"python -m pyuvsim.uvsim {config_yaml}" ) lines.append("") lines.append("echo") lines.append("echo \"End time: $(date -u +'%Y-%m-%dT%H:%M:%SZ')\"") lines.append("echo '=== Job Complete ==='") return "\n".join(lines) ``` ### 4.7 Runner definitions Define your runner classes: ```python # filepath: src/valska/external_tools/pyuvsim/runner.py from dataclasses import dataclass from pathlib import Path @dataclass(frozen=True) class CondaRunner: """Execute pyuvsim within a conda environment.""" conda_sh: str """Command to source conda (e.g., 'source /path/to/conda.sh').""" conda_env: str """Name of the conda environment.""" @dataclass(frozen=True) class ContainerRunner: """Execute pyuvsim within an Apptainer/Singularity container.""" container_image: Path """Path to the container image (.sif file).""" container_bind: str | None = None """Bind mount specification (e.g., '/data:/data,/scratch:/scratch').""" ``` ### 4.8 Main prepare function Bring it all together: ```python # filepath: src/valska/external_tools/pyuvsim/setup.py from pathlib import Path from typing import Any import shutil from .runner import CondaRunner, ContainerRunner from .slurm import render_submit_script def prepare_pyuvsim_run( *, run_id: str, results_root: Path, telescope: str, observation_id: str, config_path: Path, runner: CondaRunner | ContainerRunner, slurm_config: dict[str, Any], template_name: str | None = None, tool_config: dict[str, Any] | None = None, dry_run: bool = False, ) -> dict[str, Any]: """ Prepare a pyuvsim run directory. Returns a dict with: - run_dir: Path to created run directory - manifest: Path to manifest.json - submit_script: Path to submit script """ # Build run directory path run_dir = build_run_dir( results_root=results_root, telescope=telescope, observation_id=observation_id, run_id=run_id, ) if dry_run: return { "run_dir": run_dir, "dry_run": True, "would_create": [ run_dir / "manifest.json", run_dir / "config.yaml", run_dir / "submit_simulate.sh", ], } # Create run directory run_dir.mkdir(parents=True, exist_ok=True) # Copy configuration to run directory run_config = run_dir / "config.yaml" shutil.copy(config_path, run_config) # Generate submit script submit_script_content = render_submit_script( runner=runner, config_yaml=run_config, run_dir=run_dir, slurm=slurm_config, ) submit_script = run_dir / "submit_simulate.sh" submit_script.write_text(submit_script_content) submit_script.chmod(0o755) # Write manifest manifest_path = write_manifest( run_dir=run_dir, run_id=run_id, results_root=results_root, telescope=telescope, observation_id=observation_id, config_path=config_path, template_name=template_name, slurm_config=slurm_config, tool_config=tool_config or {}, ) return { "run_dir": run_dir, "manifest": manifest_path, "submit_script": submit_script, "config": run_config, } ``` ### 4.9 CLI main function Wire up the CLI: ```python # filepath: src/valska/external_tools/pyuvsim/cli_prepare.py #!/usr/bin/env python3 """CLI entry point for valska-pyuvsim-prepare.""" from __future__ import annotations import argparse import sys from pathlib import Path from typing import Any from .runner import CondaRunner, ContainerRunner from .setup import prepare_pyuvsim_run def build_parser() -> argparse.ArgumentParser: # ... (as shown in §4.2) pass def load_runtime_config() -> dict[str, Any]: # ... (as shown in §4.3) pass def main(argv: list[str] | None = None) -> int: """Main entry point.""" parser = build_parser() args = parser.parse_args(argv) try: # Load configuration runtime = load_runtime_config() tool_config = runtime.get("pyuvsim", {}) # Resolve paths results_root = resolve_results_root(args.results_root, runtime) # Build runner runner = CondaRunner( conda_sh=tool_config.get("conda_sh", "source ~/conda.sh"), conda_env=tool_config.get("conda_env", "pyuvsim"), ) # Build SLURM config with overrides slurm_config = build_slurm_config(tool_config, args) # Resolve config path if args.template: config_path = get_template_path(args.template) elif args.config: config_path = args.config else: print("ERROR: Must specify --config or --template", file=sys.stderr) return 2 # Prepare the run result = prepare_pyuvsim_run( run_id=args.run_id, results_root=results_root, telescope=args.telescope, observation_id=args.observation_id, config_path=config_path, runner=runner, slurm_config=slurm_config, template_name=args.template, dry_run=args.dry_run, ) # Output if args.dry_run: print("[DRY RUN] Would create:") for path in result["would_create"]: print(f" {path}") print(f"\nRun directory: {result['run_dir']}") else: print(f"Run directory: {result['run_dir']}") print(f"Manifest: {result['manifest']}") print(f"Submit script: {result['submit_script']}") print() print("Next steps:") print(f" valska-pyuvsim-submit {result['run_dir']}") return 0 except ValueError as e: print(f"ERROR: {e}", file=sys.stderr) return 2 except Exception as e: print(f"ERROR: {e}", file=sys.stderr) return 1 if __name__ == "__main__": sys.exit(main()) ``` --- ## 5. Implementing the submit phase ### 5.1 Overview The submit phase: 1. Validates the run directory exists and is prepared 2. Reads the manifest 3. Submits SLURM jobs 4. Records job IDs in `jobs.json` ### 5.2 Core submit logic ```python # filepath: src/valska/external_tools/pyuvsim/submit.py from __future__ import annotations import json import re import subprocess from dataclasses import dataclass from datetime import datetime, timezone from pathlib import Path from typing import Any class SubmissionError(RuntimeError): """Raised when SLURM submission fails.""" pass _JOBID_RE = re.compile(r"Submitted\s+batch\s+job\s+(\d+)", re.IGNORECASE) def _utc_now_iso() -> str: return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ") def load_manifest(run_dir: Path) -> dict[str, Any]: """Load and validate manifest.json.""" manifest_path = run_dir / "manifest.json" if not manifest_path.exists(): raise SubmissionError( f"Manifest not found: {manifest_path}\n" f"Hint: Run 'valska-pyuvsim-prepare' first." ) with manifest_path.open() as f: return json.load(f) def submit_pyuvsim_run( run_dir: Path, *, stage: str = "all", dry_run: bool = False, resubmit: bool = False, sbatch: str = "sbatch", ) -> dict[str, Any]: """ Submit a prepared pyuvsim run to SLURM. Parameters ---------- run_dir Path to prepared run directory. stage Stage to submit ('simulate' or 'all'). dry_run If True, print commands without executing. resubmit If True, archive existing jobs.json before submitting. sbatch Path to sbatch executable. Returns ------- dict with job submission details. """ run_dir = Path(run_dir).resolve() # Validate run directory if not run_dir.exists(): raise SubmissionError(f"Run directory not found: {run_dir}") manifest = load_manifest(run_dir) # Handle resubmission jobs_path = run_dir / "jobs.json" if jobs_path.exists() and not resubmit: raise SubmissionError( f"jobs.json already exists: {jobs_path}\n" f"Use --resubmit to archive and resubmit." ) if jobs_path.exists() and resubmit: archive_path = run_dir / f"jobs_{_utc_now_iso().replace(':', '')}.json" jobs_path.rename(archive_path) # Find submit script submit_script = run_dir / "submit_simulate.sh" if not submit_script.exists(): raise SubmissionError(f"Submit script not found: {submit_script}") # Build submission command cmd = [sbatch, str(submit_script)] if dry_run: print(f"[DRY RUN] Would execute: {' '.join(cmd)}") job_id = None else: result = subprocess.run( cmd, capture_output=True, text=True, cwd=run_dir, ) if result.returncode != 0: raise SubmissionError( f"sbatch failed with code {result.returncode}:\n{result.stderr}" ) match = _JOBID_RE.search(result.stdout) if not match: raise SubmissionError( f"Could not parse job ID from sbatch output:\n{result.stdout}" ) job_id = match.group(1) print(f"Submitted job {job_id}") # Write jobs.json jobs_record = { "schema_version": "1.0", "run_dir": str(run_dir), "manifest": str(run_dir / "manifest.json"), "sbatch": sbatch, "dry_run": dry_run, "submitted_utc": _utc_now_iso(), "stage": stage, "jobs": { "simulate": { "job_id": job_id, "script": str(submit_script), "dependency": None, }, }, "commands": [" ".join(cmd)], } if not dry_run: with jobs_path.open("w") as f: json.dump(jobs_record, f, indent=2) return jobs_record ``` ### 5.3 Submit CLI ```python # filepath: src/valska/external_tools/pyuvsim/cli_submit.py #!/usr/bin/env python3 """CLI entry point for valska-pyuvsim-submit.""" from __future__ import annotations import argparse import sys from pathlib import Path from .submit import SubmissionError, submit_pyuvsim_run def build_parser() -> argparse.ArgumentParser: parser = argparse.ArgumentParser( prog="valska-pyuvsim-submit", description="Submit a prepared pyuvsim run to SLURM.", ) parser.add_argument( "run_dir", type=Path, help="Path to prepared run directory.", ) parser.add_argument( "--stage", default="all", choices=["simulate", "all"], help="Stage to submit (default: all).", ) parser.add_argument( "--dry-run", action="store_true", help="Print commands without executing.", ) parser.add_argument( "--resubmit", action="store_true", help="Archive existing jobs.json and resubmit.", ) parser.add_argument( "--sbatch", default="sbatch", help="Path to sbatch executable.", ) return parser def main(argv: list[str] | None = None) -> int: """Main entry point.""" parser = build_parser() args = parser.parse_args(argv) try: result = submit_pyuvsim_run( run_dir=args.run_dir, stage=args.stage, dry_run=args.dry_run, resubmit=args.resubmit, sbatch=args.sbatch, ) if args.dry_run: print("\n[DRY RUN] No jobs submitted.") else: print(f"\njobs.json written to: {args.run_dir / 'jobs.json'}") return 0 except SubmissionError as e: print(f"ERROR: {e}", file=sys.stderr) return 3 if "not found" in str(e).lower() else 4 except Exception as e: print(f"ERROR: {e}", file=sys.stderr) return 1 if __name__ == "__main__": sys.exit(main()) ``` --- ## 6. Configuration in runtime_paths.yaml ### 6.1 Tool section structure Add a section for your tool in `config/runtime_paths.yaml`: ```yaml # filepath: config/runtime_paths.yaml # ... existing content ... pyuvsim: # Conda settings conda_sh: "source /home/user/miniforge3/etc/profile.d/conda.sh" conda_env: pyuvsim # Container settings (alternative to conda) # container_image: /path/to/pyuvsim.sif # container_bind: "/data:/data,/scratch:/scratch" # Default template default_template: hera_validation.yaml # SLURM defaults for simulate stage slurm_defaults: partition: cpu time: "04:00:00" mem: "32G" cpus_per_task: 16 nodes: 1 ntasks: 1 ``` ### 6.2 Documenting hardcoded defaults In your module, document which values have hardcoded defaults: ```python # filepath: src/valska/external_tools/pyuvsim/cli_prepare.py def build_slurm_config( tool_config: dict[str, Any], args: argparse.Namespace, ) -> dict[str, Any]: """ Build SLURM configuration with defaults and overrides. Hardcoded defaults (overridden by runtime_paths.yaml, then CLI): - time: "04:00:00" - mem: "16G" - cpus_per_task: 8 - nodes: 1 - ntasks: 1 - job_name_prefix: "pyuvsim" """ # Start with hardcoded defaults config = { "time": "04:00:00", "mem": "16G", "cpus_per_task": 8, "nodes": 1, "ntasks": 1, "job_name_prefix": "pyuvsim", } # Merge tool defaults from runtime_paths.yaml slurm_defaults = tool_config.get("slurm_defaults", {}) config.update(slurm_defaults) # Apply CLI overrides if args.partition: config["partition"] = args.partition if args.time: config["time"] = args.time if args.cpus_per_task: config["cpus_per_task"] = args.cpus_per_task if args.mem: config["mem"] = args.mem return config ``` --- ## 7. Templates ### 7.1 Creating templates Place configuration templates in the `templates/` subdirectory: ```yaml # filepath: src/valska/external_tools/pyuvsim/templates/hera_validation.yaml # HERA validation simulation template # Values marked __SET_BY_VALSKA__ are replaced at prepare time # Telescope configuration telescope: name: HERA config_path: "__SET_BY_VALSKA__" # Observation parameters observation: start_time: 2458115.5 duration_hours: 4.0 integration_time: 10.0 # Frequency settings frequency: start_mhz: 150.0 end_mhz: 160.0 n_channels: 128 # Sky model sky_model: path: "__SET_BY_VALSKA__" type: "gleam" # Output output: path: "__SET_BY_VALSKA__" format: "uvh5" ``` ### 7.2 Template utilities ```python # filepath: src/valska/external_tools/pyuvsim/templates.py from pathlib import Path def _templates_dir() -> Path: """Return the templates directory path.""" return Path(__file__).parent / "templates" def list_templates() -> list[str]: """List available template names.""" templates_dir = _templates_dir() if not templates_dir.exists(): return [] return sorted(p.name for p in templates_dir.glob("*.yaml")) def get_template_path(name: str) -> Path: """ Get the full path to a template by name. Parameters ---------- name Template name (with or without .yaml extension). Returns ------- Path to the template file. Raises ------ FileNotFoundError If template does not exist. """ if not name.endswith(".yaml"): name = f"{name}.yaml" path = _templates_dir() / name if not path.exists(): available = list_templates() raise FileNotFoundError( f"Template not found: {name}\n" f"Available templates: {', '.join(available) or '(none)'}" ) return path ``` --- ## 8. Public API ### 8.1 Module exports ```python # filepath: src/valska/external_tools/pyuvsim/__init__.py """ pyuvsim integration for ValSKA. Primary entry points: - prepare_pyuvsim_run: Prepare a simulation run directory. - submit_pyuvsim_run: Submit a prepared run to SLURM. - get_template_path: Access shipped simulation templates. """ from .runner import CondaRunner, ContainerRunner from .setup import prepare_pyuvsim_run from .submit import SubmissionError, submit_pyuvsim_run from .templates import get_template_path, list_templates __all__ = [ "prepare_pyuvsim_run", "submit_pyuvsim_run", "SubmissionError", "get_template_path", "list_templates", "CondaRunner", "ContainerRunner", ] ``` --- ## 9. Testing ### 9.1 Required tests Create tests in `tests/external_tools/pyuvsim/`: ```python # filepath: tests/external_tools/pyuvsim/test_prepare.py import pytest from pathlib import Path from valska.external_tools.pyuvsim import ( prepare_pyuvsim_run, CondaRunner, ) class TestPrepareDryRun: """Test prepare with --dry-run.""" def test_dry_run_creates_no_files(self, tmp_path): """Verify dry run does not create any files.""" results_root = tmp_path / "results" result = prepare_pyuvsim_run( run_id="test_001", results_root=results_root, telescope="HERA", observation_id="test", config_path=Path("dummy.yaml"), runner=CondaRunner("source conda.sh", "pyuvsim"), slurm_config={}, dry_run=True, ) assert result["dry_run"] is True assert not results_root.exists() class TestPrepareExecution: """Test actual prepare execution.""" def test_creates_run_directory(self, tmp_path): """Verify run directory is created with expected contents.""" results_root = tmp_path / "results" config_file = tmp_path / "config.yaml" config_file.write_text("test: config") result = prepare_pyuvsim_run( run_id="test_001", results_root=results_root, telescope="HERA", observation_id="obs1", config_path=config_file, runner=CondaRunner("source conda.sh", "pyuvsim"), slurm_config={"time": "01:00:00"}, dry_run=False, ) run_dir = result["run_dir"] assert run_dir.exists() assert (run_dir / "manifest.json").exists() assert (run_dir / "submit_simulate.sh").exists() assert (run_dir / "config.yaml").exists() def test_manifest_has_required_fields(self, tmp_path): """Verify manifest contains all required fields.""" import json results_root = tmp_path / "results" config_file = tmp_path / "config.yaml" config_file.write_text("test: config") result = prepare_pyuvsim_run( run_id="test_001", results_root=results_root, telescope="HERA", observation_id="obs1", config_path=config_file, runner=CondaRunner("source conda.sh", "pyuvsim"), slurm_config={}, dry_run=False, ) manifest = json.loads(result["manifest"].read_text()) # Required fields per spec §6.1.3 assert manifest["tool"] == "pyuvsim" assert "created_utc" in manifest assert "valska_version" in manifest assert manifest["run_id"] == "test_001" assert "run_dir" in manifest assert "results_root" in manifest ``` ### 9.2 Test configuration resolution ```python # filepath: tests/external_tools/pyuvsim/test_config.py import pytest from valska.external_tools.pyuvsim.cli_prepare import ( build_slurm_config, ) class TestConfigResolution: """Test configuration hierarchy.""" def test_cli_overrides_config(self): """CLI arguments take precedence over config file.""" tool_config = { "slurm_defaults": { "partition": "default", "time": "04:00:00", } } # Simulate CLI args class Args: partition = "override" time = None cpus_per_task = None mem = None config = build_slurm_config(tool_config, Args()) assert config["partition"] == "override" assert config["time"] == "04:00:00" # from config def test_config_overrides_defaults(self): """Config file overrides hardcoded defaults.""" tool_config = { "slurm_defaults": { "time": "12:00:00", # override default 04:00:00 } } class Args: partition = None time = None cpus_per_task = None mem = None config = build_slurm_config(tool_config, Args()) assert config["time"] == "12:00:00" ``` --- ## 10. Checklist Before considering your integration complete, verify: ### 10.1 Structure - [ ] Module created under `src/valska/external_tools//` - [ ] Entry points registered in `pyproject.toml` - [ ] `__init__.py` exports public API ### 10.2 Prepare phase - [ ] CLI parses required and optional arguments - [ ] `--dry-run` previews without creating files - [ ] `--results-root` overrides configured default - [ ] Run directory created with correct structure - [ ] `manifest.json` includes all required fields - [ ] SLURM submit script generated for each stage - [ ] Exit codes follow specification (0, 1, 2) ### 10.3 Submit phase - [ ] CLI accepts run directory as primary argument - [ ] Validates run directory exists and is prepared - [ ] `--dry-run` previews without submitting - [ ] `--stage` selects specific stages - [ ] `--resubmit` archives existing jobs.json - [ ] `jobs.json` written with all required fields - [ ] Exit codes follow specification (0, 1, 3, 4) ### 10.4 Configuration - [ ] Tool section documented in `runtime_paths.yaml` - [ ] Hardcoded defaults documented in code - [ ] Configuration hierarchy respected (CLI > config > defaults) - [ ] SLURM directives nullable (None omits directive) ### 10.5 Testing - [ ] Prepare dry-run test - [ ] Prepare execution test - [ ] Manifest schema test - [ ] Submit dry-run test - [ ] Configuration resolution tests --- ## Appendix A: Common patterns ### A.1 Path resolution helper ```python def resolve_path( cli_value: Path | None, config_value: str | None, default: Path | None = None, name: str = "path", ) -> Path: """Resolve a path with precedence: CLI > config > default.""" if cli_value is not None: return cli_value if config_value is not None: return Path(config_value) if default is not None: return default raise ValueError(f"No {name} configured.") ``` ### A.2 UTC timestamp helper ```python from datetime import datetime, timezone def utc_now_iso() -> str: """Return current UTC time in ISO 8601 format.""" return datetime.now(timezone.utc).isoformat() def utc_now_compact() -> str: """Return current UTC time in compact format (for filenames).""" return datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ") ``` ### A.3 Safe JSON writing ```python import json from pathlib import Path def write_json_atomic(path: Path, data: dict) -> None: """Write JSON atomically to avoid partial writes.""" tmp_path = path.with_suffix(".json.tmp") with tmp_path.open("w") as f: json.dump(data, f, indent=2) tmp_path.rename(path) ``` --- ## Appendix B: Troubleshooting ### B.1 Entry points not found after installation **Symptom:** `valska-pyuvsim-prepare: command not found` **Solution:** ```bash pip install -e . hash -r # Clear shell command cache ``` ### B.2 Import errors **Symptom:** `ModuleNotFoundError: No module named 'valska.external_tools.pyuvsim'` **Check:** 1. `__init__.py` exists in the module directory 2. Package reinstalled after adding module ### B.3 Manifest missing required fields **Symptom:** Tests fail on manifest validation **Check:** Ensure `tool` field is included: ```python manifest = { "tool": "pyuvsim", # ... } ```