Shared Infrastructure Roadmap
Version: 1.0.0-draft Status: Draft Last updated: 2026-03-10
1. Introduction
1.1 Purpose
This document outlines a future refactoring to extract shared infrastructure from individual external tool implementations into a common core module. It captures the rationale, identifies candidates for consolidation, and proposes a migration strategy.
1.2 Status
This is a roadmap, not a specification. The approach described here is intentionally deferred:
Current state: Each tool (BayesEoR, pyuvsim) implements its own utilities
Future state: Common patterns extracted into
valska_hera_beam.external_tools.coreTrigger: Refactoring should occur after two or more tools are implemented and patterns are validated in practice
1.3 Rationale for deferral
Extracting shared infrastructure before implementing multiple tools risks:
Premature abstraction — Guessing at the right interfaces without concrete evidence
Over-engineering — Building flexibility that is never used
Churn — Refactoring the shared layer as each new tool reveals edge cases
By implementing BayesEoR and pyuvsim as standalone modules first, we gain:
Concrete duplication — Clear evidence of what is genuinely shared
Validated patterns — Interfaces proven to work for different tool characteristics
Lower risk — Refactoring existing working code is safer than designing upfront
2. Current state
2.1 BayesEoR module structure
src/valska_hera_beam/external_tools/bayeseor/
├── __init__.py
├── cli_prepare.py
├── cli_submit.py
├── cli_sweep.py
├── runner.py
├── setup.py
├── slurm.py
├── submit.py
├── sweep.py
├── templates.py
└── templates/
2.2 pyuvsim module structure (planned)
src/valska_hera_beam/external_tools/pyuvsim/
├── __init__.py
├── cli_prepare.py
├── cli_submit.py
├── runner.py
├── setup.py
├── slurm.py
├── submit.py
├── templates.py
└── templates/
2.3 Observed duplication
Based on the BayesEoR implementation and the pyuvsim patterns described in the Tool Implementer’s Guide, the following areas exhibit significant overlap:
Concern |
BayesEoR location |
pyuvsim location |
Duplication level |
|---|---|---|---|
Run directory creation |
|
|
High |
Manifest writing |
|
|
High |
Jobs.json writing |
|
|
High |
SLURM submission |
|
|
High |
Runner definitions |
|
|
Medium |
Template utilities |
|
|
High |
Configuration loading |
|
|
Medium |
SLURM directive handling |
|
|
Medium |
UTC timestamp helpers |
Multiple files |
Multiple files |
High |
Dry-run semantics |
CLI files |
CLI files |
Medium |
3. Proposed shared infrastructure
3.1 Target structure
src/valska_hera_beam/external_tools/
├── core/
│ ├── __init__.py
│ ├── run_directory.py # Run directory creation and validation
│ ├── manifest.py # Manifest reading/writing
│ ├── jobs.py # Jobs.json reading/writing
│ ├── slurm.py # SLURM submission utilities
│ ├── runner.py # Base runner classes
│ ├── templates.py # Template discovery utilities
│ ├── config.py # Configuration loading and merging
│ └── utils.py # Common utilities (timestamps, JSON, etc.)
├── bayeseor/
│ └── ... # Tool-specific implementation
└── pyuvsim/
└── ... # Tool-specific implementation
3.2 Module responsibilities
3.2.1 core/run_directory.py
Responsibility: Run directory creation, validation, and path construction.
Candidate functions:
def build_run_dir(
results_root: Path,
tool: str,
taxonomy: dict[str, str],
run_id: str,
unique: bool = False,
) -> Path:
"""
Construct a run directory path.
Parameters
----------
results_root
Base results directory.
tool
Tool identifier (e.g., 'bayeseor', 'pyuvsim').
taxonomy
Tool-specific hierarchy components as key-value pairs.
e.g., {'beam_model': 'achromatic_Gaussian', 'sky_model': 'GLEAM'}
run_id
User-provided run identifier.
unique
If True, append UTC timestamp for uniqueness.
Returns
-------
Path to run directory.
"""
pass
def validate_run_dir(run_dir: Path) -> None:
"""
Validate that a run directory exists and contains required files.
Raises
------
RunDirectoryError
If validation fails.
"""
pass
def ensure_run_dir(run_dir: Path, exist_ok: bool = False) -> None:
"""
Create run directory, optionally failing if it exists.
"""
pass
Current duplication:
bayeseor/setup.py: Manual path constructionpyuvsim/setup.py:build_run_dir()function
3.2.2 core/manifest.py
Responsibility: Manifest definition, reading, and writing.
Candidate functions:
from dataclasses import dataclass
from datetime import datetime
from pathlib import Path
from typing import Any
@dataclass
class ManifestBase:
"""Required fields for all manifests."""
tool: str
created_utc: str
valska_version: str
run_id: str
run_dir: str
results_root: str
def write_manifest(
run_dir: Path,
*,
tool: str,
run_id: str,
results_root: Path,
extra_fields: dict[str, Any] | None = None,
) -> Path:
"""
Write manifest.json with required fields plus tool-specific extras.
Automatically includes:
- tool
- created_utc
- valska_version
- run_id
- run_dir
- results_root
"""
pass
def load_manifest(run_dir: Path) -> dict[str, Any]:
"""
Load and validate manifest.json from a run directory.
Raises
------
ManifestError
If manifest is missing or invalid.
"""
pass
def validate_manifest(manifest: dict[str, Any]) -> None:
"""
Validate that manifest contains all required fields and raise clear errors.
"""
pass
Current duplication:
bayeseor/setup.py:write_manifest()with BayesEoR-specific fieldsbayeseor/submit.py:load_manifest()pyuvsim/setup.py:write_manifest()with pyuvsim-specific fieldspyuvsim/submit.py:load_manifest()
3.2.3 core/jobs.py
Responsibility: Jobs.json schema definition, reading, writing, and archival.
Candidate functions:
from pathlib import Path
from typing import Any
def write_jobs_json(
run_dir: Path,
*,
stage: str,
jobs: dict[str, dict[str, Any]],
commands: list[str],
dry_run: bool = False,
sbatch: str = "sbatch",
extra_fields: dict[str, Any] | None = None,
) -> Path:
"""
Write jobs.json with submission details.
Automatically includes:
- run_dir
- manifest path
- submitted_utc
"""
pass
def load_jobs_json(run_dir: Path) -> dict[str, Any] | None:
"""
Load jobs.json if it exists, otherwise return None.
"""
pass
def archive_jobs_json(run_dir: Path) -> Path | None:
"""
Archive existing jobs.json with timestamp suffix.
Returns path to archived file, or None if no jobs.json existed.
"""
pass
def has_submitted_stage(jobs: dict[str, Any] | None, stage: str) -> bool:
"""
Check if a stage has been submitted.
"""
pass
Current duplication:
bayeseor/submit.py: Jobs.json writing and archivalbayeseor/cli_submit.py:_archive_jobs_json(),_load_jobs_json()pyuvsim/submit.py: Jobs.json writing
3.2.4 core/slurm.py
Responsibility: SLURM job submission and output parsing.
Candidate functions:
import re
import subprocess
from pathlib import Path
from typing import Any
JOBID_REGEX = re.compile(r"Submitted\s+batch\s+job\s+(\d+)", re.IGNORECASE)
class SlurmSubmissionError(RuntimeError):
"""Raised when sbatch submission fails."""
pass
def submit_script(
script_path: Path,
*,
sbatch: str = "sbatch",
dependency: str | None = None,
cwd: Path | None = None,
dry_run: bool = False,
) -> str | None:
"""
Submit a SLURM script and return the job ID.
Parameters
----------
script_path
Path to submit script.
sbatch
Path to sbatch executable.
dependency
Dependency specification (e.g., 'afterok:12345').
cwd
Working directory for submission.
dry_run
If True, print command without executing.
Returns
-------
Job ID as string, or None if dry_run.
Raises
------
SlurmSubmissionError
If submission fails.
"""
pass
def parse_job_id(sbatch_output: str) -> str:
"""
Extract job ID from sbatch output.
Raises
------
SlurmSubmissionError
If job ID cannot be parsed.
"""
pass
def build_dependency_string(job_ids: list[str], mode: str = "afterok") -> str:
"""
Build a SLURM dependency string.
Parameters
----------
job_ids
List of job IDs to depend on.
mode
Dependency mode ('afterok', 'afterany', etc.).
Returns
-------
Dependency string (e.g., 'afterok:123:456').
"""
pass
Current duplication:
bayeseor/submit.py:_JOBID_RE, sbatch invocationpyuvsim/submit.py:_JOBID_RE, sbatch invocation
3.2.5 core/runner.py
Responsibility: Base runner class definitions.
Candidate classes:
from dataclasses import dataclass
from pathlib import Path
@dataclass(frozen=True)
class CondaRunner:
"""Execute a tool within a conda environment."""
conda_sh: str
"""Command to source conda (e.g., 'source /path/to/conda.sh')."""
conda_env: str
"""Name of the conda environment."""
def activation_commands(self) -> list[str]:
"""Return shell commands to activate the environment."""
return [self.conda_sh, f"conda activate {self.conda_env}"]
@dataclass(frozen=True)
class ContainerRunner:
"""Execute a tool within an Apptainer/Singularity container."""
container_image: Path
"""Path to the container image (.sif file)."""
container_bind: str | None = None
"""Bind mount specification."""
def exec_prefix(self) -> str:
"""Return the apptainer exec prefix."""
bind = f"--bind {self.container_bind} " if self.container_bind else ""
return f"apptainer exec {bind}{self.container_image}"
Current duplication:
bayeseor/runner.py:CondaRunner,ContainerRunner,BayesEoRInstallpyuvsim/runner.py:CondaRunner,ContainerRunner
Note: Tool-specific install classes (e.g., BayesEoRInstall) remain in tool modules.
3.2.6 core/templates.py
Responsibility: Template discovery utilities.
Candidate functions:
from pathlib import Path
def get_templates_dir(tool_module_path: Path) -> Path:
"""
Return the templates directory for a tool module.
Parameters
----------
tool_module_path
Path to a file in the tool module (typically __file__).
Returns
-------
Path to templates/ directory.
"""
return tool_module_path.parent / "templates"
def list_templates(templates_dir: Path, suffix: str = ".yaml") -> list[str]:
"""
List available template names in a directory.
"""
if not templates_dir.exists():
return []
return sorted(p.name for p in templates_dir.glob(f"*{suffix}"))
def get_template_path(
templates_dir: Path,
name: str,
suffix: str = ".yaml",
) -> Path:
"""
Get full path to a template by name.
Raises
------
FileNotFoundError
If template does not exist.
"""
if not name.endswith(suffix):
name = f"{name}{suffix}"
path = templates_dir / name
if not path.exists():
available = list_templates(templates_dir, suffix)
raise FileNotFoundError(
f"Template not found: {name}\n"
f"Available: {', '.join(available) or '(none)'}"
)
return path
Current duplication:
bayeseor/templates.py:_templates_dir(),list_templates(),get_template_path()pyuvsim/templates.py: Identical pattern
3.2.7 core/config.py
Responsibility: Configuration loading and merging.
Candidate functions:
from pathlib import Path
from typing import Any
import yaml
def load_runtime_paths() -> dict[str, Any]:
"""
Load runtime_paths.yaml from the config directory.
Returns empty dict if file does not exist.
"""
pass
def get_tool_config(runtime: dict[str, Any], tool: str) -> dict[str, Any]:
"""
Extract tool-specific configuration section.
"""
return runtime.get(tool, {})
def get_nested(d: dict[str, Any], *keys: str, default: Any = None) -> Any:
"""
Safely navigate nested dictionary keys.
"""
cur = d
for k in keys:
if not isinstance(cur, dict):
return default
cur = cur.get(k, default)
if cur is default:
return default
return cur
def merge_slurm_config(
defaults: dict[str, Any],
tool_config: dict[str, Any],
cli_overrides: dict[str, Any],
) -> dict[str, Any]:
"""
Merge SLURM configuration with precedence: CLI > tool config > defaults.
None values in higher-precedence layers suppress the key.
"""
result = dict(defaults)
result.update({k: v for k, v in tool_config.items() if v is not None})
result.update({k: v for k, v in cli_overrides.items() if v is not None})
# Remove any keys explicitly set to None
return {k: v for k, v in result.items() if v is not None}
Current duplication:
bayeseor/cli_prepare.py:_get_nested(),_slurm_defaults()pyuvsim/cli_prepare.py:load_runtime_config(),build_slurm_config()
3.2.8 core/utils.py
Responsibility: Common utility functions.
Candidate functions:
import json
from datetime import datetime, timezone
from pathlib import Path
from typing import Any
def utc_now_iso() -> str:
"""Return current UTC time in ISO 8601 format."""
return datetime.now(timezone.utc).isoformat()
def utc_now_compact() -> str:
"""Return current UTC time in compact format (for filenames)."""
return datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
def write_json_atomic(path: Path, data: dict[str, Any], indent: int = 2) -> None:
"""
Write JSON atomically to avoid partial writes on failure.
"""
tmp_path = path.with_suffix(".json.tmp")
with tmp_path.open("w") as f:
json.dump(data, f, indent=indent)
tmp_path.rename(path)
def load_json(path: Path) -> dict[str, Any]:
"""Load JSON file."""
with path.open() as f:
return json.load(f)
Current duplication:
bayeseor/setup.py:_utc_stamp()bayeseor/cli_submit.py:_utc_now_compact()bayeseor/submit.py:_utc_now_iso()pyuvsim/submit.py:_utc_now_iso()
4. Migration strategy
4.1 Prerequisites
Before beginning migration:
pyuvsim implemented — At least two tools exist to validate patterns
Tests passing — Both tool modules have comprehensive test coverage
Patterns validated — Identified shared code confirmed to be genuinely common
4.2 Phase 1: Extract utilities (low risk)
Scope: core/utils.py
Steps:
Create
core/directory with__init__.pyImplement
core/utils.pywith timestamp and JSON helpersUpdate BayesEoR to import from
core.utilsUpdate pyuvsim to import from
core.utilsRemove duplicated helpers from tool modules
Run tests, verify no regressions
Estimated effort: 1–2 hours
4.3 Phase 2: Extract templates (low risk)
Scope: core/templates.py
Steps:
Implement
core/templates.pywith generic template utilitiesUpdate tool-specific
templates.pyto use core utilitiesTool modules retain thin wrappers that specify their templates directory
Run tests
Estimated effort: 1–2 hours
4.4 Phase 3: Extract runner base classes (medium risk)
Scope: core/runner.py
Steps:
Implement
core/runner.pywithCondaRunnerandContainerRunnerTool-specific install classes (e.g.,
BayesEoRInstall) remain in tool modulesUpdate tool modules to import base runners from core
Run tests
Estimated effort: 2–3 hours
4.5 Phase 4: Extract SLURM submission (medium risk)
Scope: core/slurm.py
Steps:
Implement
core/slurm.pywith submission utilitiesUpdate tool
submit.pymodules to use core submissionTool modules retain script generation (tool-specific)
Run tests
Estimated effort: 3–4 hours
4.6 Phase 5: Extract manifest/jobs handling (medium risk)
Scope: core/manifest.py, core/jobs.py
Steps:
Implement
core/manifest.pywith generic manifest utilitiesImplement
core/jobs.pywith jobs.json utilitiesUpdate tool modules to use core for reading/writing
Tool modules provide tool-specific extra fields
Run tests
Estimated effort: 4–6 hours
4.7 Phase 6: Extract configuration loading (medium risk)
Scope: core/config.py
Steps:
Implement
core/config.pywith configuration utilitiesUpdate tool CLI modules to use core configuration loading
Tool modules retain tool-specific argument parsing
Run tests
Estimated effort: 3–4 hours
4.8 Phase 7: Extract run directory handling (low–medium risk)
Scope: core/run_directory.py
Steps:
Implement
core/run_directory.pywith generic path constructionTool modules provide taxonomy definition, core handles construction
Update tool modules
Run tests
Estimated effort: 2–3 hours
5. Post-migration structure
5.1 Core module
src/valska_hera_beam/external_tools/core/
├── __init__.py
├── config.py
├── jobs.py
├── manifest.py
├── run_directory.py
├── runner.py
├── slurm.py
├── templates.py
└── utils.py
5.2 Tool module (simplified)
src/valska_hera_beam/external_tools/pyuvsim/
├── __init__.py
├── cli_prepare.py # Argument parsing, tool-specific logic
├── cli_submit.py # Thin wrapper around core submission
├── runner.py # Tool-specific install class (if needed)
├── slurm.py # SLURM script generation (tool-specific)
├── templates.py # Thin wrapper around core templates
└── templates/
5.3 Dependency direction
┌─────────────────────┐
│ core module │ ← No dependencies on tool modules
└─────────────────────┘
▲
│ imports
│
┌─────────────────────┐
│ bayeseor module │
└─────────────────────┘
┌─────────────────────┐
│ pyuvsim module │
└─────────────────────┘
Tool modules depend on core; core never imports from tools.
6. Success criteria
6.1 Quantitative
Metric |
Target |
|---|---|
Lines of code reduction per tool |
30–50% |
Test duplication reduction |
40–60% |
Time to implement new tool |
50% less than current |
6.2 Qualitative
New tool implementations focus on tool-specific concerns
Common bugs fixed once in core, all tools benefit
Consistent behaviour across tools for shared operations
Clear separation between “what all tools do” and “what this tool does”
7. Risks and mitigations
Risk |
Impact |
Likelihood |
Mitigation |
|---|---|---|---|
Core abstraction doesn’t fit new tool |
High |
Medium |
Keep core interfaces minimal; allow bypass |
Breaking changes during migration |
Medium |
Medium |
Migrate one module at a time; comprehensive tests |
Over-abstraction |
Medium |
Low |
Extract only proven patterns; resist “future-proofing” |
Increased coupling |
Medium |
Low |
Core has no knowledge of specific tools |
8. Decision log
Date |
Decision |
Rationale |
|---|---|---|
2026-01-21 |
Defer shared infrastructure extraction |
Avoid premature abstraction; validate patterns with pyuvsim first |
2026-01-21 |
Document intended structure as roadmap |
Capture intent while allowing standalone tool development |
9. Open questions
9.1 To be resolved before migration
Exception hierarchy — Should core define a base exception class that tool-specific exceptions inherit from?
Logging — Should core provide a logging configuration, or leave this to tools?
CLI framework — Is
argparsesufficient, or should core provide argument parsing helpers?Schema validation — Should core provide formal validators (e.g., Pydantic) or lightweight validate-on-read helpers that produce user-friendly errors and include
valska_versionin messages?
9.2 To be resolved during migration
Backwards compatibility — How long to maintain deprecated imports in tool modules?
Documentation — Should core have its own API documentation, or integrate with tool docs?
10. References
BayesEoR reference implementation:
src/valska_hera_beam/external_tools/bayeseor/
11. Related future direction: public Python API and documentation strategy
11.1 Purpose
This roadmap primarily covers shared infrastructure for external tool
integration. A closely related, but distinct, future task is to define a small,
intentional public Python API for valska_hera_beam and align the
documentation around that supported surface.
This section records the intended direction so that later refactoring can be planned deliberately rather than emerging accidentally from documentation or import convenience.
11.2 Earlier approach and current interim state
At present, the repository still exposes only a minimal top-level Python API, while the documentation includes a broader set of module pages.
The earlier recursive documentation approach relied more heavily on broad discovery from the package root. In practice, that created a few avoidable problems:
Fragile documentation builds API generation became sensitive to how Sphinx discovered and imported submodules, rather than depending only on the pages we intended to build.
Accidental coupling to package layout Internal module organisation and generated intermediate package pages began to affect the success of the docs build, even when the runtime behaviour of the package itself was unchanged.
Ambiguity about what is public Recursive discovery made it easier for internal modules to look supported merely because they appeared in the generated reference output.
Higher maintenance cost when imports evolved Changes in import structure, optional dependencies, or notebook-related modules could cause documentation failures that were disproportionate to the actual code change.
The new interim explicit-module approach improves this situation substantially. In particular, it:
reduces dependence on recursive module discovery;
removes the need for fragile intermediate package-summary pages to remain consistent;
makes the documentation build depend on an intentional list of module pages; and
provides a practical, low-risk fix that keeps continuous integration stable without changing runtime behaviour.
This interim approach is acceptable for now, because it:
avoids making premature compatibility promises through
valska_hera_beam.__init__;keeps the branch-protection-critical documentation build stable; and
allows BayesEoR and related tooling to continue evolving without implying that every internal module is public and supported.
11.3 Recommended target state
The recommended long-term model is the one commonly used by public-facing scientific Python packages:
Curate a supported public API Re-export only the functions, classes, and helpers that are intended for external users.
Document the public API first Treat the top-level package, and a small number of genuinely public subpackages, as the main reference surface.
Keep internal modules internal Workflow orchestration helpers, implementation details, and CLI plumbing should not become de facto public merely because they appear in recursive documentation output.
Use narrative documentation for workflows Operational and scientific workflows should continue to be explained through guides, examples, and CLI documentation rather than relying on low-level API pages alone.
This long-term direction is still preferable to the present explicit-module listing, because it offers a better overall balance for a public-facing repository:
Clearer support boundary Users can distinguish more easily between supported public imports and internal implementation detail.
Lower ongoing documentation churn A curated public API reduces the need to update long lists of individual modules whenever internal structure changes.
Greater refactoring freedom Internal modules can evolve with less risk of accidental public commitments.
Better alignment with user needs Most users need stable entry points and workflow documentation rather than a near-complete map of internal source files.
11.4 Why this is deferred
This change should be deferred until after the current modularisation work is merged and the public surface can be reviewed intentionally.
Deferral is recommended because:
The package surface is still evolving BayesEoR orchestration code is currently being restructured, so it would be unhelpful to freeze import paths prematurely.
Not every documented module should be public Some modules are implementation detail, developer utility, or CLI support code rather than stable library API.
Top-level re-exports create maintenance obligations Once symbols are presented as public, later refactoring becomes more constrained.
Documentation stability is currently the priority The present explicit module listing is a pragmatic fix that keeps the docs build reliable without changing runtime behaviour.
11.5 Interim approach
Until a public API review is completed, the preferred documentation strategy is:
keep the current explicit API reference list in
docs/source/api.rst;avoid broad recursive discovery from the package root;
continue documenting workflows and CLI usage separately from the Python API;
treat any future top-level re-export as an explicit design decision.
11.6 Proposed review questions for the follow-up task
When this work is taken up, the review should answer the following:
Which functions and classes are intended for external Python users?
Which import paths should be considered stable across releases?
Which modules are internal implementation detail and should remain outside the top-level namespace?
Which CLI-facing modules should be documented as commands rather than as Python API surface?
Which imports are lightweight and safe enough to expose at package-import time?
11.7 Suggested implementation sequence
Review the existing Python surface and identify genuinely public objects.
Define a small curated export set in
valska_hera_beam.__init__.Add or refine top-level package documentation around that curated surface.
Retain detailed module pages only where they are useful for advanced users or developers.
Update contributor guidance once the public/private boundary has been agreed.
11.8 Decision note
For the avoidance of doubt, the current documentation fix does not commit the project to the present module layout as a permanent public API. It is an interim documentation-stability measure, not a statement that every documented module path is part of the long-term supported interface.