valska.utils
Utility functions for the ValSKA package.
This module provides:
Path management via
PathManagerLoading of analysis paths from
config/paths.yamlLoading of site/user runtime paths from
config/runtime_paths.yamlHelpers to build perturbation groups and readable labels
Simple filtering helpers for perturbation keys
Notes on runtime_paths.yaml
config/runtime_paths.yaml is intended for site/user-specific settings, e.g.
results_root (where all ValSKA outputs go)
data.root (a default root for input datasets; used to resolve relative –data paths)
BayesEoR repo_path / conda_sh / conda_env defaults
Other external tool paths (pyuvsim, OSKAR) in future
Functions
|
Build a simple group_labels dict (identity mapping). |
|
Build groups for perturbation runs from paths.yaml for one or more prefixes. |
|
Filter chain pairs by signed perturbation value in percentage points. |
|
Filter chain pairs by absolute perturbation value in percentage points. |
Get a |
|
|
Load analysis paths from a YAML configuration file. |
|
Load site/user runtime paths from |
Create a timestamp string for naming files and directories. |
|
|
Resolve an input dataset path using runtime_paths.yaml defaults. |
Classes
|
Manage paths for ValSKA workflows and generated outputs. |
- class valska.utils.PathManager(base_dir: str | Path | None = None, chains_dir: str | Path | None = None, data_dir: str | Path | None = None, results_dir: str | Path | None = None, results_root: str | Path | None = None, runtime_paths_file: str | Path | None = None)
Manage paths for ValSKA workflows and generated outputs.
Initialize the PathManager with configurable directories.
If directories are not specified, they will be automatically determined relative to the package location and common HPC conventions.
Parameters
- base_dir
Base directory of the project. If None, it’s determined automatically.
- chains_dir
Directory containing chain files. If None, falls back through environment/config-aware defaults.
- data_dir
- Directory containing data files (input datasets). If None, attempts:
config/runtime_paths.yaml: data.root
<base_dir>/data (created)
- results_dir
Directory for ValSKA-produced results (tables/plots/summaries). If None, defaults to <results_root>/validation (created).
- results_root
- Root directory for all ValSKA-generated outputs, including external tool outputs.
If None, PathManager resolves this from runtime config, environment, and then a repository-local default.
- runtime_paths_file
Optional explicit path to a runtime paths YAML file. If None, uses
<base_dir>/config/runtime_paths.yaml.
- create_subdir(parent: str, name: str) Path
Create and return a subdirectory in one of the managed directories.
Parameters
- parent
Name of the parent directory (one of the managed paths).
- name
Name of the subdirectory to create.
Returns
- Path
Path to the created subdirectory.
Raises
- KeyError
If the parent directory name is invalid.
- find_file(pattern: str, path_name: str | None = None) list[Path]
Find files matching a pattern in a specified directory.
Parameters
- pattern
Glob pattern to match files.
- path_name
Name of the directory to search in. If None, searches in
base_dir.
Returns
- list of Path
List of paths to files matching the pattern.
- get_path(name: str) Path
Get a specific path by name.
Parameters
- name
Name of the path to retrieve.
Returns
- Path
Requested path.
Raises
- KeyError
If the requested path name doesn’t exist.
- get_paths() dict[str, Path]
Get a dictionary of all managed paths.
Returns
- dict
Dictionary mapping path names to
pathlib.Pathobjects.
- resolve_data_path(data_path: str | Path) Path
Resolve a dataset path using this PathManager’s runtime_paths.
See module-level resolve_data_path() for the rules.
- valska.utils._default_results_root(base_dir: Path) Path
Resolve a sensible default results_root for HPC and local use.
- Resolution order:
runtime_paths.yaml (results_root key) [handled by PathManager]
$VALSKA_RESULTS_ROOT
$SCRATCH/UKSRC/ValSKA/results
$HOME/UKSRC/ValSKA/results
<base_dir>/results
- valska.utils._parse_pp_key_to_float(key: str) float
Parse a perturbation key of the form
'<something><value>pp'to float.This is a small internal utility to convert keys like
'GSM_FgEoR_-1e0pp'or'GL_FgOnly_1.0e-01pp'into the numeric value (already in percentage points) used for filtering.Parameters
- key
Perturbation key ending with
'pp'.
Returns
- float
The parsed numeric value.
Raises
- ValueError
If the key does not end with
'pp'or the numeric part cannot be parsed as a float.
- valska.utils._pp_key_to_percent_label(key: str, prefix: str, label_prefix: str | None = None) str | None
Convert
'<prefix><pp>'key into a'<label_prefix> ±X%'label.Parameters
- keystr
Full analysis key (e.g.
'GSM_FgEoR_-1e0pp','GL_FgEoR_1.0e-01pp').- prefixstr
The prefix to strip before the numeric part (e.g.
'GSM_FgEoR_','GL_FgEoR_').- label_prefixstr, optional
Text to put in front of the percentage (default: derived from prefix).
Returns
- str or None
Readable label (e.g.
'GSM -1%','GL +0.1%') orNoneif the key does not match the expected format.
- valska.utils.build_group_labels(groups: dict[str, list[str]]) dict[str, str]
Build a simple group_labels dict (identity mapping).
- valska.utils.build_pp_groups_from_paths(prefixes: list[str], custom_paths_file: str | Path | None = None, label_prefixes: dict[str, str] | None = None) dict[str, list[str]]
Build groups for perturbation runs from paths.yaml for one or more prefixes.
Examples
prefixes=['GSM_FgEoR_']-> GSM v5d0 EoR+Fgprefixes=['GL_FgEoR_']-> GSM+GLEAM v7d0 EoR+Fgprefixes=['GSM_FgEoR_', 'GL_FgEoR_']-> combinedlabel_prefixes={'GSM_FgEoR_': 'GSM', 'GL_FgEoR_': 'GL'}-> labels like ‘GSM -1%’, ‘GL -1%’ instead of both ‘GSM …’
- valska.utils.filter_chain_pairs(pairs: Mapping[str, object], min_value: float = -0.1, max_value: float = 0.1) dict[str, object]
Filter chain pairs by signed perturbation value in percentage points.
Parameters
- pairs
Mapping from perturbation key (ending with
'pp') to any value (e.g.ChainPairobjects).- min_value
Minimum allowed value (inclusive), in percentage points.
- max_value
Maximum allowed value (inclusive), in percentage points.
Returns
- dict
Filtered mapping containing only keys with
min_value <= value <= max_value.
Notes
The numeric value is taken directly from the
'<value>pp'suffix, e.g.'-1e0pp' -> -1.0,'1.0e-01pp' -> 0.1.
- valska.utils.filter_chain_pairs_absolute_range(pairs: Mapping[str, object], min_abs_value: float = 0.001, max_abs_value: float = 0.1) dict[str, object]
Filter chain pairs by absolute perturbation value in percentage points.
Parameters
- pairs
Mapping from perturbation key (ending with
'pp') to any value (e.g.ChainPairobjects).- min_abs_value
Minimum allowed absolute value (inclusive), in percentage points.
- max_abs_value
Maximum allowed absolute value (inclusive), in percentage points.
Returns
- dict
Filtered mapping containing only keys with
min_abs_value <= |value| <= max_abs_value.
- valska.utils.get_default_path_manager() PathManager
Get a
PathManagerinstance with default settings.
- valska.utils.load_paths(custom_paths_file: str | Path | None = None) dict[str, str]
Load analysis paths from a YAML configuration file.
Parameters
- custom_paths_file
Custom paths file to load. If None, loads the default paths file.
Returns
- dict
Dictionary of analysis path keys to relative chain subdirectories.
- valska.utils.load_runtime_paths(base_dir: str | Path | None = None, runtime_paths_file: str | Path | None = None) dict[str, object]
Load site/user runtime paths from
config/runtime_paths.yamlif present.This configuration is intended for site/user-specific settings such as
results_root,data.root, and default external-tool paths.Parameters
- base_dir
Repository base directory. If None, inferred from this module location. Only used when runtime_paths_file is None.
- runtime_paths_file
Explicit path to a runtime paths YAML. If None, resolution order is
$VALSKA_RUNTIME_PATHS_FILEfirst, then<base_dir>/config/runtime_paths.yaml, then$PWD/config/runtime_paths.yamlwhen<base_dir>appears to be an installed site-packages/dist-packages path.
Returns
- dict
Parsed YAML mapping. Returns an empty dict if the file does not exist.
Raises
- ValueError
If the YAML exists but does not contain a mapping at top level.
- valska.utils.make_timestamp() str
Create a timestamp string for naming files and directories.
Returns
- str
Current timestamp in format
YYYY-MM-DD_HHMMSS.
- valska.utils.resolve_data_path(data_path: str | Path, runtime_paths: Mapping[str, object] | None = None) Path
Resolve an input dataset path using runtime_paths.yaml defaults.
Rules
If data_path is absolute: return it (expanded + resolved).
- If data_path is relative and runtime_paths contains data.root:
return <data.root>/<data_path> (expanded + resolved).
Otherwise: resolve relative to the current working directory.
This keeps CLIs explicit (–data is still required) while reducing boilerplate and enabling site-specific mount points.
Parameters
- data_path
Dataset path provided by a user/CLI.
- runtime_paths
Parsed runtime paths mapping (typically from load_runtime_paths()).
Returns
- Path
Fully resolved absolute path.