valska_hera_beam.external_tools.bayeseor.slurm
SLURM submit-script rendering for BayesEoR runs.
Functions
|
Render a SLURM submit script for a BayesEoR run. |
- valska_hera_beam.external_tools.bayeseor.slurm.render_submit_script(*, runner: CondaRunner | ContainerRunner, install: BayesEoRInstall, config_yaml: Path, run_dir: Path, slurm: Mapping[str, object] | None = None, mode: str = 'cpu') str
Render a SLURM submit script for a BayesEoR run.
This function aims to be HPC-friendly and debugging-friendly:
emits clear “what am I running” information
prints SLURM_* environment variables (helpful when diagnosing scheduling issues)
uses
srun --mpi=<mpi> -n "$SLURM_NTASKS"by default to match common site setupsuses BayesEoR’s CLI flags: - CPU stage:
--cpu- GPU stage:--gpu --run
Parameters
- runner
How BayesEoR is executed (currently conda; container support later).
- install
Where BayesEoR lives and how to locate its run script.
- config_yaml
Path to the rendered BayesEoR config YAML to run.
- run_dir
Run directory containing configs/logs/manifests.
- slurm
SLURM settings map. Usually derived from runtime_paths.yaml defaults plus CLI overrides.
Any key set to ``None`` (or not present) will be omitted from the generated script. This allows cluster-specific suppression of directives that are not supported or not needed.
Supported keys (all optional):
Job identification: - ``job_name``: Full job name (overrides prefix-based naming) - ``job_name_prefix``: Prefix for auto-generated job name (default: "bayeseor") Resource selection: - ``partition``: Partition/queue name (omit for constraint-based scheduling) - ``constraint``: Node feature constraint (e.g., "A100", "skylake") - ``qos``: Quality of service - ``account``: Account/project to charge - ``reservation``: Reservation name Time and memory: - ``time``: Wall time limit (default: "12:00:00") - ``mem``: Memory per node (default: "8G") - ``mem_per_cpu``: Memory per CPU (mutually exclusive with ``mem``) - ``mem_per_gpu``: Memory per GPU CPU/task configuration: - ``nodes``: Number of nodes (default: 1) - ``ntasks``: Total number of tasks (default: 1) - ``ntasks_per_node``: Tasks per node (default: 1) - ``cpus_per_task``: CPUs per task (default: 4) GPU configuration: - ``gpus``: Total GPUs (e.g., ``2`` or ``"a100:2"``) - ``gpus_per_node``: GPUs per node - ``gpus_per_task``: GPUs per task (common for single-GPU jobs) - ``gres``: Generic resources (e.g., ``"gpu:1"``, ``"gpu:a100:2"``) Execution control: - ``mpi``: MPI type for srun (default: "pmi2") - ``exclusive``: Request exclusive node access (True emits ``--exclusive``) Output: - ``output``: Stdout log path (default: ``run_dir/slurm-{mode}-%j.out``) - ``error``: Stderr log path (if separate from stdout) Extensibility: - ``extra_sbatch``: Additional ``#SBATCH`` lines (without the ``"#SBATCH "`` prefix)- mode
Execution mode:
"cpu": precompute instrument transfer matrices (BayesEoR--cpu)"gpu_run": run sampling assuming precompute exists (BayesEoR--gpu --run)
Notes on timing
We intentionally do NOT rely on
/usr/bin/timeor shelltime, since some compute-node images do not provide them. Timing is implemented using only:date(UTC timestamps + epoch seconds)
This should work essentially everywhere.
Notes on cluster portability
Different HPC sites have different scheduling policies:
Some use
--partition, others use--constraintSome require
--account, others don’tGPU syntax varies:
--gpus-per-taskvs--gres=gpu:N
Set unsupported/unwanted directives to None in your runtime_paths.yaml to omit them from generated scripts.