valska_hera_beam.external_tools.bayeseor.slurm

SLURM submit-script rendering for BayesEoR runs.

Functions

render_submit_script(*, runner, install, ...)

Render a SLURM submit script for a BayesEoR run.

valska_hera_beam.external_tools.bayeseor.slurm.render_submit_script(*, runner: CondaRunner | ContainerRunner, install: BayesEoRInstall, config_yaml: Path, run_dir: Path, slurm: Mapping[str, object] | None = None, mode: str = 'cpu') str

Render a SLURM submit script for a BayesEoR run.

This function aims to be HPC-friendly and debugging-friendly:

  • emits clear “what am I running” information

  • prints SLURM_* environment variables (helpful when diagnosing scheduling issues)

  • uses srun --mpi=<mpi> -n "$SLURM_NTASKS" by default to match common site setups

  • uses BayesEoR’s CLI flags: - CPU stage: --cpu - GPU stage: --gpu --run

Parameters

runner

How BayesEoR is executed (currently conda; container support later).

install

Where BayesEoR lives and how to locate its run script.

config_yaml

Path to the rendered BayesEoR config YAML to run.

run_dir

Run directory containing configs/logs/manifests.

slurm

SLURM settings map. Usually derived from runtime_paths.yaml defaults plus CLI overrides.

Any key set to ``None`` (or not present) will be omitted from the generated script. This allows cluster-specific suppression of directives that are not supported or not needed.

Supported keys (all optional):

Job identification:
  - ``job_name``: Full job name (overrides prefix-based naming)
  - ``job_name_prefix``: Prefix for auto-generated job name
    (default: "bayeseor")
Resource selection:
  - ``partition``: Partition/queue name (omit for constraint-based scheduling)
  - ``constraint``: Node feature constraint (e.g., "A100", "skylake")
  - ``qos``: Quality of service
  - ``account``: Account/project to charge
  - ``reservation``: Reservation name
Time and memory:
  - ``time``: Wall time limit (default: "12:00:00")
  - ``mem``: Memory per node (default: "8G")
  - ``mem_per_cpu``: Memory per CPU (mutually exclusive with ``mem``)
  - ``mem_per_gpu``: Memory per GPU
CPU/task configuration:
  - ``nodes``: Number of nodes (default: 1)
  - ``ntasks``: Total number of tasks (default: 1)
  - ``ntasks_per_node``: Tasks per node (default: 1)
  - ``cpus_per_task``: CPUs per task (default: 4)
GPU configuration:
  - ``gpus``: Total GPUs (e.g., ``2`` or ``"a100:2"``)
  - ``gpus_per_node``: GPUs per node
  - ``gpus_per_task``: GPUs per task (common for single-GPU jobs)
  - ``gres``: Generic resources (e.g., ``"gpu:1"``, ``"gpu:a100:2"``)
Execution control:
  - ``mpi``: MPI type for srun (default: "pmi2")
  - ``exclusive``: Request exclusive node access (True emits ``--exclusive``)
Output:
  - ``output``: Stdout log path (default: ``run_dir/slurm-{mode}-%j.out``)
  - ``error``: Stderr log path (if separate from stdout)
Extensibility:
  - ``extra_sbatch``: Additional ``#SBATCH`` lines
    (without the ``"#SBATCH "`` prefix)
mode

Execution mode:

  • "cpu": precompute instrument transfer matrices (BayesEoR --cpu)

  • "gpu_run": run sampling assuming precompute exists (BayesEoR --gpu --run)

Notes on timing

We intentionally do NOT rely on /usr/bin/time or shell time, since some compute-node images do not provide them. Timing is implemented using only:

  • date (UTC timestamps + epoch seconds)

This should work essentially everywhere.

Notes on cluster portability

Different HPC sites have different scheduling policies:

  • Some use --partition, others use --constraint

  • Some require --account, others don’t

  • GPU syntax varies: --gpus-per-task vs --gres=gpu:N

Set unsupported/unwanted directives to None in your runtime_paths.yaml to omit them from generated scripts.