Dataset Catalog#

The repository ships with multiple HDF5 datasets stored under datasets/<name>/. Each folder contains data.hdf5 (train/test/val splits), optional parameter sets, and metadata such as timesteps. Datasets are typically produced via the codes.create_dataset helper described in :doc:guides/extending-benchmark, which ensures a consistent layout:

train / test / val: arrays with shape (n_samples, n_timesteps, n_quantities).
Optional *_params: per-trajectory parameters (e.g., radiation field, metallicity).
timesteps: explicit timeline (logarithmic or linear).
Attributes such as n_quantities, n_parameters, and train/test/val counts for quick inspection (older datasets may infer these values on the fly).

When you add new datasets, follow the same convention so the CLI can auto-discover everything.

Dataset	Source
`branca24`	https://zenodo.org/records/13624794
`branca_large_kyr`	https://zenodo.org/records/14204650
`branca_large_myr`	https://zenodo.org/records/14243267
`branca_norad`	https://zenodo.org/records/14041124
`cloud`	https://zenodo.org/records/18018572
`cloud_parametric`	https://zenodo.org/records/18018572
`coupled_oscillators`	https://zenodo.org/records/14717175
`lotka_volterra`	https://zenodo.org/records/14703867
`lv_parametric`	https://zenodo.org/records/15412214
`lv_parametric_no_params`	https://zenodo.org/records/15412297
`osu2008`	https://zenodo.org/records/13749089
`primordial`	https://zenodo.org/records/18018572
`primordial_parametric`	https://zenodo.org/records/18018572
`simple_ode`	https://zenodo.org/records/14878373
`simple_primordial`	https://zenodo.org/records/13754361
`simple_reaction`	https://zenodo.org/records/14712934

Visualisations#

For each dataset we usually publish a quick set of plots (trajectories, gradients, distributions, random example). Generate or refresh them via:

python datasets/_data_analysis/analyse_all_datasets.py

The script iterates over the list called datasets inside datasets/_data_analysis/analyse_all_datasets.py. To visualise a new dataset, append its identifier there and add an entry to the dictionary defined in datasets/_data_analysis/dataset_dict.py. Each entry specifies whether to plot on a log scale, how many quantities per subplot (app), and the plotting tolerance (controls axis limits). The script then produces PNGs under each dataset folder.

osu dataset trajectories — Example visualisation (osu2008) showing the log-scale trajectories generated by the helper script.#

Download helper#

You rarely need to download anything manually: the training, tuning, and evaluation CLIs call download_data on demand, which uses the URLs defined in datasets/data_sources.yaml. The first time you reference a dataset, its data.hdf5 is fetched and cached under datasets/<name>/. Keep data_sources.yaml up to date if you mirror the data in your own storage.

Dataset Catalog

Contents

Dataset Catalog#

Visualisations#

Download helper#