Getting Started#
CODES Benchmark helps you compare surrogate models for coupled ODE systems by running consistent training, evaluation, and reporting pipelines. This page summarizes the minimum you need to install the project, configure a run, and validate that everything is wired correctly.
Prerequisites#
Python 3.10 with
pip(we recommenduvfor faster installs, but plain pip works)(Optional) CUDA-capable GPU if you want to reproduce the default configurations
Enough disk space to store downloaded datasets and the checkpoints created under
trained/andresults/
Installation#
We recommend uv because the repository already ships with pyproject.toml + uv.lock. Cloning and syncing is enough—uv run will automatically create/update the environment the first time you execute a command.
uv workflow (recommended)
git clone https://github.com/robin-janssen/CODES-Benchmark.git
cd CODES-Benchmark
uv sync # creates .venv using uv.lock
uv run python -c "import codes; print('CODES ready!')" # optional smoke test
After this, you can prefix any command with uv run (for example uv run python run_training.py --config config.yaml) and uv will ensure dependencies are in place.
pip / virtualenv fallback
git clone https://github.com/robin-janssen/CODES-Benchmark.git
cd CODES-Benchmark
python -m venv .venv && source .venv/bin/activate
pip install -e .
pip install -r requirements.txt
Both approaches expose the codes package locally so scripts such as run_training.py and run_eval.py can import it without extra path hacks. Installing the pinned requirements ensures feature parity with CI and the documentation build.
Configure a run#
Benchmarks usually follow the pattern: hyperparameter tuning → full training → evaluation/reporting. Before diving into the advanced knobs, start from the minimal configuration we ship in config.yaml:
training_id: "my_first_benchmark"
surrogates: ["MultiONet"]
batch_size: [65536]
epochs: [200]
dataset:
name: "osu2008"
devices: ["cuda:0"] # or ["cpu"] if you lack a GPU
Copy
config.yaml(orconfig_full.yamlfor inspiration) to a new name inside the repo.Update the surrogate list, dataset, devices, and optional study switches such as
interpolation,extrapolation,sparse,batch_scaling, oruncertainty.Keep the file under version control so you can trace results.
See the configuration reference for a complete list of keys, defaults, and tips.
Run your first benchmark#
Use the minimal configuration above (or a copy of it) to perform a smoke test:
Train the requested surrogate (creates
trained/<training_id>):python run_training.py --config path/to/your_config.yaml
Evaluate / benchmark everything that was trained:
python run_eval.py --config path/to/your_config.yaml
Inspect the generated tables under
results/<training_id>and the plots insideplots/<training_id>.
Every CLI script honours the --config flag and logs progress to the console.
Where to go next#
Running benchmarks: deep-dive into the workflow, multi-device execution, and troubleshooting.
Configuration reference: every knob explained.
Dataset catalog: discover the bundled datasets and download URLs.
Tutorials: notebooks that demonstrate data loading, custom analysis, and plotting pipelines.