Configuration Reference

Contents

Configuration Reference#

All orchestration happens through YAML configuration files. This page documents every configuration option used by run_training.py and run_eval.py, including which keys are required and which ones have defaults.

How defaults work#

Missing keys fall back to the defaults listed below via .get(...).
Modality blocks (interpolation, extrapolation, sparse, batch_scaling, uncertainty) are disabled if the entire block is omitted.
Evaluation switches (losses, iterative, gradients, timing, compute, compare) default to False if omitted.
Required keys must be provided or the run will fail.

Top-level keys#

Key	Required	Default	Used by	Notes
`training_id`	Yes	None	Training, Eval	Folder name under `trained/`, `results/`, and `plots/`.
`surrogates`	Yes	None	Training, Eval	Ordered list of surrogate class names.
`batch_size`	Yes	None	Training, Eval	Int or list aligned with `surrogates`.
`epochs`	Yes	None	Training, Eval	Int or list aligned with `surrogates`.
`devices`	Yes	None	Training, Eval	List of device strings (`cpu`, `cuda:0`, `mps`, …).
`seed`	No	`42`	Training	Random seed for training.
`verbose`	No	`false`	Training, Eval	Extra data-loading logs.
`checkpoint`	No	`false`	Training	Enables best-checkpoint saving per model.

Dataset block#

Key	Required	Default	Used by	Notes
`dataset.name`	Yes	None	Training, Eval	Folder inside `datasets/`.
`dataset.log10_transform`	No	`true`	Training, Eval	Log10 transform the data.
`dataset.log10_transform_params`	No	`true`	Training, Eval	Log10 transform the parameters (if present).
`dataset.normalise`	No	`"minmax"`	Training, Eval	`"minmax"`, `"standardise"`, or `"disable"`.
`dataset.normalise_per_species`	No	`false`	Training, Eval	Normalize each species independently.
`dataset.tolerance`	No	`None`	Training, Eval	Lower bound before log transform (`None` means no lower bound).
`dataset.subset_factor`	No	`1`	Training	Down-samples data (smoke tests).
`dataset.log_timesteps`	No	`false`	Eval	Used for plotting/log-time axes.
`dataset.use_optimal_params`	No	`true`	Training, Eval	Load surrogate-specific defaults from dataset configs.

Modality blocks (optional)#

All modality blocks are disabled if omitted. If enabled: true, the corresponding list/value is required.

Block	Required	Default	Keys when enabled
`interpolation`	No	disabled	`intervals` (list of ints)
`extrapolation`	No	disabled	`cutoffs` (list of ints)
`sparse`	No	disabled	`factors` (list of ints)
`batch_scaling`	No	disabled	`sizes` (list of factors, e.g. `["1/2", "1/4"]`)
`uncertainty`	No	disabled	`ensemble_size` (int)

Evaluation switches#

All switches default to false if omitted.

Key	Default	Notes
`losses`	`false`	Plots training and test losses.
`iterative`	`false`	Iterative roll-out evaluation.
`gradients`	`false`	Gradient vs error analysis.
`timing`	`false`	Inference timing benchmarks.
`compute`	`false`	Memory/parameter count benchmarks.
`compare`	`false`	Cross-surrogate comparison plots/tables.

Metric options#

Key	Default	Notes
`relative_error_threshold`	`0.0`	Denominator floor for relative error.
`error_percentile`	`99`	Percentile used in error summaries.

Full example config (defaults)#

This example includes every key with the default behavior applied. Required values are filled with common placeholders.

# Required
training_id: "example_run"
surrogates: ["MultiONet"]
batch_size: [65536]
epochs: [200]
devices: ["cpu"]

# Optional (defaults)
seed: 42
verbose: false
checkpoint: false

dataset:
  name: "osu2008"
  log10_transform: true
  log10_transform_params: true
  normalise: "minmax"
  normalise_per_species: false
  tolerance: null
  subset_factor: 1
  log_timesteps: false
  use_optimal_params: true

# Modalities (disabled unless enabled)
interpolation:
  enabled: false
  intervals: [2, 3, 4]
extrapolation:
  enabled: false
  cutoffs: [50, 60, 70]
sparse:
  enabled: false
  factors: [2, 4, 8]
batch_scaling:
  enabled: false
  sizes: ["1/16", "1/8", "1/4", "1/2"]
uncertainty:
  enabled: false
  ensemble_size: 5

# Evaluation switches (default false)
losses: false
iterative: false
gradients: false
timing: false
compute: false
compare: false

# Metric options
relative_error_threshold: 0.0
error_percentile: 99