Accuracy Metrics#

Chemical-abundance trajectories span many orders of magnitude, which makes standard absolute or relative error metrics misleading. CODES therefore measures accuracy in log space everywhere—during tuning, training, and evaluation.

Log-space absolute error#

  • mLAE (mean log absolute error): the mean of |log10(pred) - log10(target)| across all species, timesteps, and trajectories.

  • LAE99`: the 99th percentile of the same distribution. This captures worst-case behaviour without being dominated by a handful of outliers.

Because the models are trained on log-transformed abundances, these values directly express errors in orders of magnitude (dex).

Why not relative error?#

  • Relative errors explode when dividing by very small abundances and become asymmetric (large overestimations vs. capped underestimations).

  • Adding thresholds to stabilize the denominator implicitly weights species without a clear physical justification.

Log-space metrics avoid these pitfalls: they treat over- and under-predictions symmetrically and remain finite even for tiny abundances.

Where metrics appear#

  • Tuning — single-objective studies minimize LAE99; dual-objective studies minimize both LAE99 and inference time.

  • Training — loss functions (e.g., Smooth L1, MSE) operate on log-transformed outputs, so optimization aligns with the evaluation metrics.

  • Evaluation — core reports always include mLAE and LAE99, alongside timing/compute metrics. Optional diagnostics (loss curves, gradient correlations) are derived from the same log-space signals.