Running Benchmarks#
This hub explains how the benchmark orchestrator works end to end. A typical run looks like this:
Hyperparameter tuning — Optuna samples candidate configs per surrogate and stores the trials in a database. You decide which trials become new defaults (single- or multi-objective).
Training —
run_training.pyreads your benchmark config, schedules “main” runs + optional modalities, and produces checkpoints undertrained/<training_id>/.Evaluation —
run_eval.pyreloads those checkpoints, applies whichever evaluation suites you enabled, and writes structured metrics/plots.
Two supporting sections describe the baseline architectures that ship with CODES and the modalities that expand each training run.
TL;DR
Follow the links below chronologically when learning the system; jump directly to modalities or architectures when you need implementation specifics.