How to: Use Cross-Validation#

Cross-validation (CV) evaluates how well a model generalizes by repeatedly training and validating on rotating subsets of data. ModularML provides CrossValidation and CVBinding to integrate CV directly with the Experiment API.

Prerequisites: This notebook uses Evaluation and EvalLossMetric callbacks. Read How to: Use Callbacks first if you are not familiar with them.

This notebook covers:

Dataset Setup
Model and Experiment
Execution Plan
CVBinding
Running Cross-Validation
Accessing Results
Summary

%matplotlib inline
import numpy as np

from modularml import (
    AppliedLoss,
    EvalPhase,
    Experiment,
    FeatureSet,
    Loss,
    ModelGraph,
    ModelNode,
    Optimizer,
    TrainPhase,
)
from modularml.models.torch import SequentialMLP
from modularml.samplers import SimpleSampler

Dataset Setup#

We create a synthetic dataset that mimics a battery health monitoring scenario: 50 sensors, each providing 10 voltage readings, with a scalar state-of-health target (soh). A sensor_id tag column identifies which sensor each sample belongs to.

We split the data into two stages:

Source / Test split: separate sensors held out for final testing from those used for cross-validation. We keep sensor groups intact (group_by="sensor_id").
Train / Val split within source: randomly divide source sensors into train and val splits. These are the splits that will rotate during CV.

rng = np.random.default_rng(13)
n_sensors = 50
n_readings_per_sensor = 10
n_samples = n_sensors * n_readings_per_sensor  # 500 total

# sensor_id repeats for each reading within a sensor
sensor_ids = np.repeat(np.arange(n_sensors), n_readings_per_sensor).astype(str)

fs = FeatureSet.from_dict(
    label="SensorData",
    data={
        "voltage": list(rng.standard_normal((n_samples, 50))),
        "soh": list(rng.standard_normal((n_samples, 1))),
        "sensor_id": list(sensor_ids),
    },
    feature_keys="voltage",
    target_keys="soh",
    tag_keys="sensor_id",
)
print(f"Total samples: {len(fs)}")
print(f"Tags: {fs.get_tag_keys()}")

fs.clear_splits()

# Stage 1: split by sensor_id - keeps all readings from a sensor together
# source = 40 sensors (80%), test = 10 sensors (20%)
fs.split_random(
    ratios={"source": 0.8, "test": 0.2},
    group_by="sensor_id",
    seed=13,
)

# Stage 2: randomly split source readings into train and val
fs.get_split("source").split_random(
    ratios={"train": 0.7, "val": 0.3},
    seed=42,
)

fs.visualize()

Model and Experiment#

We set up a simple MLP model graph and create an experiment. The setup is identical to the approach in How to: Create and Use an Experiment.

fs_ref = fs.reference(features="voltage", targets="soh")

mn_mlp = ModelNode(
    label="MLP",
    model=SequentialMLP(output_shape=(1, 1), n_layers=2, hidden_dim=16),
    upstream_ref=fs_ref,
)

graph = ModelGraph(
    label="SimpleGraph",
    nodes=[mn_mlp],
    optimizer=Optimizer("adam", opt_kwargs={"lr": 1e-3}, backend="torch"),
)
graph.build()
graph.visualize()

exp = Experiment.from_active_context(label="my_experiment")

Execution Plan#

We define the execution plan that will run inside each fold of the cross-validation. This plan consists of:

A TrainPhase on the train split, with an Evaluation callback that monitors validation loss on the val split after every epoch.
A final EvalPhase on the held-out test split.

from modularml.callbacks import EvalLossMetric, Evaluation

mse_loss = AppliedLoss(
    loss=Loss("mse", backend="torch"),
    on="MLP",
    inputs=["outputs", "targets"],
)

# Evaluation callback: run on val split after every epoch
eval_cb = Evaluation.from_split(
    label="eval_val",
    split="val",
    every_n_epochs=1,
    metrics=[
        EvalLossMetric(
            name="val_loss",
            loss=AppliedLoss(
                loss=Loss("mse", backend="torch"),
                on="MLP",
                inputs=["targets", "outputs"],
            ),
        ),
    ],
)

train_phase = TrainPhase.from_split(
    label="train",
    split="train",
    sampler=SimpleSampler(batch_size=4, shuffle=True, seed=42),
    losses=[mse_loss],
    n_epochs=2,
    callbacks=[eval_cb],
)

# Final eval on held-out test split
eval_phase = EvalPhase.from_split(
    label="eval",
    split="test",
    losses=[mse_loss],
)

exp.execution_plan.add_phase(train_phase)
exp.execution_plan.add_phase(eval_phase)

We can verify this plan before running cross-validation with preview_run. Unlike the run_ methods, preview_ methods do not mutate the Experiment state.

This allows us to verify execution plans before running a final phase sequence without worrying about accidentally pre-training the ModelGraph.

# Verify the plan by running the experiment once before starting CV
exp.preview_run()
print("Single experiment run completed.")

CVBinding#

A CVBinding tells CrossValidation which FeatureSet to fold over and which existing splits form the CV pool.

    CVBinding(
        fs: str | FeatureSet,
        source_splits: list[str],
        *,
        group_by: str | list[str] | None = None,
        stratify_by: str | list[str] | None = None,
        train_split_name: str = "train",
        val_split_name: str = "val",
        val_size: float | None = None,
    )

Parameter	Type	Default	Description
`fs`	`str \| FeatureSet`	(required)	The `FeatureSet` to fold.
`source_splits`	`list[str]`	(required)	Existing splits to pool before folding.
`group_by`	`str \| list[str] \| None`	`None`	Keep groups together across fold boundaries.
`stratify_by`	`str \| list[str] \| None`	`None`	Balance strata across folds (mutually exclusive with `group_by`).
`train_split_name`	`str`	`"train"`	The split name that receives each fold’s training data.
`val_split_name`	`str`	`"val"`	The split name that receives each fold’s validation data.
`val_size`	`float \| None`	`None`	Explicit validation proportion per fold. If `None`, uses `1 / n_folds`.

How folding works#

source_splits specifies which existing splits are pooled into the CV data. In our case source_splits=["train", "val"] combines the train and val samples into one pool. This pool is then split into n_folds equal pieces. Each fold uses one piece as validation and the remainder as training, replacing the train and val splits in the FeatureSet for that fold’s execution.

Note that we could just use the source split as our pool, as it is union of train and val samples. We use a distinct list of splits to show that any views can be merged into the CV pool, they do not need to originate from the same parent view (but they do need to belong to the same FeatureSet).

The test split is not included in source_splits, so it remains unchanged across all folds.

from modularml import CrossValidation, CVBinding

cv = CrossValidation(
    bindings=CVBinding(
        fs=fs,
        source_splits=["train", "val"],
        group_by="sensor_id",  # keep all readings from a sensor in the same fold
    ),
    n_folds=5,
    seed=13,
    experiment=exp,
)
print(f"CrossValidation: {cv.n_folds} folds")
print(f"Phase template:  {[e.label for e in cv.phase_template.all]}")

Running Cross-Validation#

Call cv.run() to execute all folds. For each fold, CrossValidation:

Partitions the pooled source data into n_folds non-overlapping pieces.
Creates a temporary context where train = all-but-one piece, val = the held-out piece, and test remains unchanged.
Runs the full execution plan inside the temporary context.
Restores the original context (the original FeatureSet and Experiment are identical after CV as before CV).

cv.run() returns a CVResults object containing one PhaseGroupResults per fold.

cv_res = cv.run()
print(cv_res)
print(f"Fold labels: {cv_res.fold_labels}")

Accessing Results#

Per-fold results#

CVResults extends PhaseGroupResults. Each fold’s results are accessed with get_fold(i) (by index) or get_fold("fold_i") (by label).

# Access the first fold
fold_0 = cv_res.get_fold(0)
print(f"Fold 0 results: {fold_0}")

# Training results for fold 0
train_res_0 = fold_0.get_train_result("train")
print(f"  train results: {train_res_0}")

# Final eval results for fold 0
eval_res_0 = fold_0.get_eval_result("eval")
print(f"  eval results:  {eval_res_0}")

Validation loss tracked during training#

The EvalLossMetric inside the Evaluation callback logged val_loss to the MetricStore each epoch. Access it via TrainResults.metrics.

for fold_label in cv_res.fold_labels:
    fold = cv_res.get_fold(fold_label)
    train_res = fold.get_train_result("train")
    val_loss = train_res.metrics().where(name="val_loss").last(sort_by="epoch").value
    print(f"{fold_label}: final val_loss = {val_loss:.4f}")

Cross-fold training losses#

CVResults.losses() collects training losses across all folds and returns an AxisSeries keyed by (fold, epoch, batch, label). Use .where(), .collapse(), and .at() from the AxisSeries API to filter and aggregate.

# Training losses over all folds and epochs
train_losses = cv_res.losses(node="MLP", phase="train")
print(f"Axes: {train_losses.axes}")

# Mean across batches, then across folds
mean_by_epoch = (
    train_losses
    .collapse(axis="batch", reducer="mean")
    .collapse(axis="fold", reducer="mean")
    .squeeze()
)
print("Mean train loss per epoch (averaged across batches and folds):")
for epoch, loss_record in mean_by_epoch.items():
    print(f"  epoch {epoch}: {loss_record.trainable:.4f}")

Custom fold extraction with `collect()`#

CVResults.collect() applies an arbitrary extractor to each fold, returning an AxisSeries with a fold axis prepended.

# Collect the final-epoch val_loss scalar from each fold
final_val_losses = cv_res.collect(
    lambda fold: (
        fold.get_train_result("train")
        .metrics()
        .where(name="val_loss")
        .last(sort_by="epoch")
    ),
)
print("Final val_loss per fold:")
for fold_label, metric_entry in final_val_losses.items():
    print(f"  {fold_label}: {metric_entry.value:.4f}")

Summary#

`CrossValidation` Constructor#

Parameter	Type	Default	Description
`bindings`	`CVBinding \| list[CVBinding]`	(required)	Fold configurations per `FeatureSet`.
`n_folds`	`int`	`5`	Number of folds.
`seed`	`int`	`13`	Random seed for fold generation.
`label`	`str`	`"CV"`	Label applied to generated fold groups.
`phase`	`TrainPhase \| PhaseGroup \| None`	`None`	Phase to run per fold. If `None`, uses the experiment’s execution plan.
`experiment`	`Experiment \| None`	`None`	Experiment to execute. Defaults to the active experiment.

`CVBinding` Constructor#

Parameter	Type	Default	Description
`fs`	`str \| FeatureSet`	(required)	`FeatureSet` to fold over.
`source_splits`	`list[str]`	(required)	Splits pooled into the CV data.
`group_by`	`str \| list[str] \| None`	`None`	Tag column(s) for group-based folding.
`stratify_by`	`str \| list[str] \| None`	`None`	Tag column(s) for stratified folding.
`train_split_name`	`str`	`"train"`	Split name replaced with fold training data.
`val_split_name`	`str`	`"val"`	Split name replaced with fold validation data.
`val_size`	`float \| None`	`None`	Explicit validation size per fold (`1/n_folds` if `None`).

`CVResults` API#

Method / Property	Returns	Description
`n_folds`	`int`	Number of completed folds.
`fold_labels`	`list[str]`	Fold labels in execution order.
`get_fold(fold)`	`PhaseGroupResults`	Results for a specific fold (by index or label).
`losses(node, phase)`	`AxisSeries[(fold, epoch, batch, label)]`	Training losses across all folds.
`collect(extractor)`	`AxisSeries`	Apply a function to each fold; merge results with `fold` axis prepended.

Data Flow During Cross-Validation#

FeatureSet (unchanged after CV completes)
  ├─ source (pooled into CV)
  │     ├─ train  <-- replaced with fold training data
  │     └─ val    <-- replaced with fold validation data
  └─ test         <-- unchanged in all folds (not in source_splits)

Each fold creates a temporary context where train and val are swapped out. The original FeatureSet is never mutated.