valska_hera_beam.utils

Utility functions for the ValSKA package.

This module provides:

  • Path management via PathManager

  • Loading of analysis paths from config/paths.yaml

  • Loading of site/user runtime paths from config/runtime_paths.yaml

  • Helpers to build perturbation groups and readable labels

  • Simple filtering helpers for perturbation keys

Notes on runtime_paths.yaml

config/runtime_paths.yaml is intended for site/user-specific settings, e.g.

  • results_root (where all ValSKA outputs go)

  • data.root (a default root for input datasets; used to resolve relative –data paths)

  • BayesEoR repo_path / conda_sh / conda_env defaults

  • Other external tool paths (pyuvsim, OSKAR) in future

Functions

build_group_labels(groups)

Build a simple group_labels dict (identity mapping).

build_pp_groups_from_paths(prefixes[, ...])

Build groups for perturbation runs from paths.yaml for one or more prefixes.

filter_chain_pairs(pairs[, min_value, max_value])

Filter chain pairs by signed perturbation value in percentage points.

filter_chain_pairs_absolute_range(pairs[, ...])

Filter chain pairs by absolute perturbation value in percentage points.

get_default_path_manager()

Get a PathManager instance with default settings.

load_paths([custom_paths_file])

Load analysis paths from a YAML configuration file.

load_runtime_paths([base_dir, ...])

Load site/user runtime paths from config/runtime_paths.yaml if present.

make_timestamp()

Create a timestamp string for naming files and directories.

resolve_data_path(data_path[, runtime_paths])

Resolve an input dataset path using runtime_paths.yaml defaults.

Classes

PathManager([base_dir, chains_dir, ...])

Manage paths for ValSKA workflows and generated outputs.

class valska_hera_beam.utils.PathManager(base_dir: str | Path | None = None, chains_dir: str | Path | None = None, data_dir: str | Path | None = None, results_dir: str | Path | None = None, results_root: str | Path | None = None, runtime_paths_file: str | Path | None = None)

Manage paths for ValSKA workflows and generated outputs.

Initialize the PathManager with configurable directories.

If directories are not specified, they will be automatically determined relative to the package location and common HPC conventions.

Parameters

base_dir

Base directory of the project. If None, it’s determined automatically.

chains_dir

Directory containing chain files. If None, falls back through environment/config-aware defaults.

data_dir
Directory containing data files (input datasets). If None, attempts:
  • config/runtime_paths.yaml: data.root

  • <base_dir>/data (created)

results_dir

Directory for ValSKA-produced results (tables/plots/summaries). If None, defaults to <results_root>/validation (created).

results_root
Root directory for all ValSKA-generated outputs, including external tool outputs.

If None, PathManager resolves this from runtime config, environment, and then a repository-local default.

runtime_paths_file

Optional explicit path to a runtime paths YAML file. If None, uses <base_dir>/config/runtime_paths.yaml.

create_subdir(parent: str, name: str) Path

Create and return a subdirectory in one of the managed directories.

Parameters

parent

Name of the parent directory (one of the managed paths).

name

Name of the subdirectory to create.

Returns

Path

Path to the created subdirectory.

Raises

KeyError

If the parent directory name is invalid.

find_file(pattern: str, path_name: str | None = None) list[Path]

Find files matching a pattern in a specified directory.

Parameters

pattern

Glob pattern to match files.

path_name

Name of the directory to search in. If None, searches in base_dir.

Returns

list of Path

List of paths to files matching the pattern.

get_path(name: str) Path

Get a specific path by name.

Parameters

name

Name of the path to retrieve.

Returns

Path

Requested path.

Raises

KeyError

If the requested path name doesn’t exist.

get_paths() dict[str, Path]

Get a dictionary of all managed paths.

Returns

dict

Dictionary mapping path names to pathlib.Path objects.

resolve_data_path(data_path: str | Path) Path

Resolve a dataset path using this PathManager’s runtime_paths.

See module-level resolve_data_path() for the rules.

valska_hera_beam.utils._default_results_root(base_dir: Path) Path

Resolve a sensible default results_root for HPC and local use.

Resolution order:
  1. runtime_paths.yaml (results_root key) [handled by PathManager]

  2. $VALSKA_RESULTS_ROOT

  3. $SCRATCH/UKSRC/ValSKA/results

  4. $HOME/UKSRC/ValSKA/results

  5. <base_dir>/results

valska_hera_beam.utils._parse_pp_key_to_float(key: str) float

Parse a perturbation key of the form '<something><value>pp' to float.

This is a small internal utility to convert keys like 'GSM_FgEoR_-1e0pp' or 'GL_FgOnly_1.0e-01pp' into the numeric value (already in percentage points) used for filtering.

Parameters

key

Perturbation key ending with 'pp'.

Returns

float

The parsed numeric value.

Raises

ValueError

If the key does not end with 'pp' or the numeric part cannot be parsed as a float.

valska_hera_beam.utils._pp_key_to_percent_label(key: str, prefix: str, label_prefix: str | None = None) str | None

Convert '<prefix><pp>' key into a '<label_prefix> ±X%' label.

Parameters

keystr

Full analysis key (e.g. 'GSM_FgEoR_-1e0pp', 'GL_FgEoR_1.0e-01pp').

prefixstr

The prefix to strip before the numeric part (e.g. 'GSM_FgEoR_', 'GL_FgEoR_').

label_prefixstr, optional

Text to put in front of the percentage (default: derived from prefix).

Returns

str or None

Readable label (e.g. 'GSM -1%', 'GL +0.1%') or None if the key does not match the expected format.

valska_hera_beam.utils.build_group_labels(groups: dict[str, list[str]]) dict[str, str]

Build a simple group_labels dict (identity mapping).

valska_hera_beam.utils.build_pp_groups_from_paths(prefixes: list[str], custom_paths_file: str | Path | None = None, label_prefixes: dict[str, str] | None = None) dict[str, list[str]]

Build groups for perturbation runs from paths.yaml for one or more prefixes.

Examples

prefixes=['GSM_FgEoR_'] -> GSM v5d0 EoR+Fg

prefixes=['GL_FgEoR_'] -> GSM+GLEAM v7d0 EoR+Fg

prefixes=['GSM_FgEoR_', 'GL_FgEoR_'] -> combined

label_prefixes={'GSM_FgEoR_': 'GSM', 'GL_FgEoR_': 'GL'} -> labels like ‘GSM -1%’, ‘GL -1%’ instead of both ‘GSM …’

valska_hera_beam.utils.filter_chain_pairs(pairs: Mapping[str, object], min_value: float = -0.1, max_value: float = 0.1) dict[str, object]

Filter chain pairs by signed perturbation value in percentage points.

Parameters

pairs

Mapping from perturbation key (ending with 'pp') to any value (e.g. ChainPair objects).

min_value

Minimum allowed value (inclusive), in percentage points.

max_value

Maximum allowed value (inclusive), in percentage points.

Returns

dict

Filtered mapping containing only keys with min_value <= value <= max_value.

Notes

The numeric value is taken directly from the '<value>pp' suffix, e.g. '-1e0pp' -> -1.0, '1.0e-01pp' -> 0.1.

valska_hera_beam.utils.filter_chain_pairs_absolute_range(pairs: Mapping[str, object], min_abs_value: float = 0.001, max_abs_value: float = 0.1) dict[str, object]

Filter chain pairs by absolute perturbation value in percentage points.

Parameters

pairs

Mapping from perturbation key (ending with 'pp') to any value (e.g. ChainPair objects).

min_abs_value

Minimum allowed absolute value (inclusive), in percentage points.

max_abs_value

Maximum allowed absolute value (inclusive), in percentage points.

Returns

dict

Filtered mapping containing only keys with min_abs_value <= |value| <= max_abs_value.

valska_hera_beam.utils.get_default_path_manager() PathManager

Get a PathManager instance with default settings.

valska_hera_beam.utils.load_paths(custom_paths_file: str | Path | None = None) dict[str, str]

Load analysis paths from a YAML configuration file.

Parameters

custom_paths_file

Custom paths file to load. If None, loads the default paths file.

Returns

dict

Dictionary of analysis path keys to relative chain subdirectories.

valska_hera_beam.utils.load_runtime_paths(base_dir: str | Path | None = None, runtime_paths_file: str | Path | None = None) dict[str, object]

Load site/user runtime paths from config/runtime_paths.yaml if present.

This configuration is intended for site/user-specific settings such as results_root, data.root, and default external-tool paths.

Parameters

base_dir

Repository base directory. If None, inferred from this module location. Only used when runtime_paths_file is None.

runtime_paths_file

Explicit path to a runtime paths YAML. If None, resolution order is $VALSKA_RUNTIME_PATHS_FILE first, then <base_dir>/config/runtime_paths.yaml, then $PWD/config/runtime_paths.yaml when <base_dir> appears to be an installed site-packages/dist-packages path.

Returns

dict

Parsed YAML mapping. Returns an empty dict if the file does not exist.

Raises

ValueError

If the YAML exists but does not contain a mapping at top level.

valska_hera_beam.utils.make_timestamp() str

Create a timestamp string for naming files and directories.

Returns

str

Current timestamp in format YYYY-MM-DD_HHMMSS.

valska_hera_beam.utils.resolve_data_path(data_path: str | Path, runtime_paths: Mapping[str, object] | None = None) Path

Resolve an input dataset path using runtime_paths.yaml defaults.

Rules

  • If data_path is absolute: return it (expanded + resolved).

  • If data_path is relative and runtime_paths contains data.root:

    return <data.root>/<data_path> (expanded + resolved).

  • Otherwise: resolve relative to the current working directory.

This keeps CLIs explicit (–data is still required) while reducing boilerplate and enabling site-specific mount points.

Parameters

data_path

Dataset path provided by a user/CLI.

runtime_paths

Parsed runtime paths mapping (typically from load_runtime_paths()).

Returns

Path

Fully resolved absolute path.