How to: Use Cross-Validation#

Cross-validation (CV) evaluates how well a model generalizes by repeatedly training and validating on rotating subsets of data. ModularML provides CrossValidation and CVBinding to integrate CV directly with the Experiment API.

Prerequisites: This notebook uses Evaluation and EvalLossMetric callbacks. Read How to: Use Callbacks first if you are not familiar with them.

This notebook covers:

%matplotlib inline
import numpy as np

from modularml import (
    AppliedLoss,
    EvalPhase,
    Experiment,
    FeatureSet,
    Loss,
    ModelGraph,
    ModelNode,
    Optimizer,
    TrainPhase,
)
from modularml.models.torch import SequentialMLP
from modularml.samplers import SimpleSampler

Dataset Setup#

We create a synthetic dataset that mimics a battery health monitoring scenario: 50 sensors, each providing 10 voltage readings, with a scalar state-of-health target (soh). A sensor_id tag column identifies which sensor each sample belongs to.

We split the data into two stages:

  1. Source / Test split: separate sensors held out for final testing from those used for cross-validation. We keep sensor groups intact (group_by="sensor_id").

  2. Train / Val split within source: randomly divide source sensors into train and val splits. These are the splits that will rotate during CV.

rng = np.random.default_rng(13)
n_sensors = 50
n_readings_per_sensor = 10
n_samples = n_sensors * n_readings_per_sensor  # 500 total

# sensor_id repeats for each reading within a sensor
sensor_ids = np.repeat(np.arange(n_sensors), n_readings_per_sensor).astype(str)

fs = FeatureSet.from_dict(
    label="SensorData",
    data={
        "voltage": list(rng.standard_normal((n_samples, 50))),
        "soh": list(rng.standard_normal((n_samples, 1))),
        "sensor_id": list(sensor_ids),
    },
    feature_keys="voltage",
    target_keys="soh",
    tag_keys="sensor_id",
)
print(f"Total samples: {len(fs)}")
print(f"Tags: {fs.get_tag_keys()}")
fs.clear_splits()

# Stage 1: split by sensor_id - keeps all readings from a sensor together
# source = 40 sensors (80%), test = 10 sensors (20%)
fs.split_random(
    ratios={"source": 0.8, "test": 0.2},
    group_by="sensor_id",
    seed=13,
)

# Stage 2: randomly split source readings into train and val
fs.get_split("source").split_random(
    ratios={"train": 0.7, "val": 0.3},
    seed=42,
)

fs.visualize()

Model and Experiment#

We set up a simple MLP model graph and create an experiment. The setup is identical to the approach in How to: Create and Use an Experiment.

fs_ref = fs.reference(features="voltage", targets="soh")

mn_mlp = ModelNode(
    label="MLP",
    model=SequentialMLP(output_shape=(1, 1), n_layers=2, hidden_dim=16),
    upstream_ref=fs_ref,
)

graph = ModelGraph(
    label="SimpleGraph",
    nodes=[mn_mlp],
    optimizer=Optimizer("adam", opt_kwargs={"lr": 1e-3}, backend="torch"),
)
graph.build()
graph.visualize()

exp = Experiment.from_active_context(label="my_experiment")

Execution Plan#

We define the execution plan that will run inside each fold of the cross-validation. This plan consists of:

  1. A TrainPhase on the train split, with an Evaluation callback that monitors validation loss on the val split after every epoch.

  2. A final EvalPhase on the held-out test split.

from modularml.callbacks import EvalLossMetric, Evaluation

mse_loss = AppliedLoss(
    loss=Loss("mse", backend="torch"),
    on="MLP",
    inputs=["outputs", "targets"],
)

# Evaluation callback: run on val split after every epoch
eval_cb = Evaluation.from_split(
    label="eval_val",
    split="val",
    every_n_epochs=1,
    metrics=[
        EvalLossMetric(
            name="val_loss",
            loss=AppliedLoss(
                loss=Loss("mse", backend="torch"),
                on="MLP",
                inputs=["targets", "outputs"],
            ),
        ),
    ],
)

train_phase = TrainPhase.from_split(
    label="train",
    split="train",
    sampler=SimpleSampler(batch_size=4, shuffle=True, seed=42),
    losses=[mse_loss],
    n_epochs=2,
    callbacks=[eval_cb],
)

# Final eval on held-out test split
eval_phase = EvalPhase.from_split(
    label="eval",
    split="test",
    losses=[mse_loss],
)

exp.execution_plan.add_phase(train_phase)
exp.execution_plan.add_phase(eval_phase)

We can verify this plan before running cross-validation with preview_run. Unlike the run_ methods, preview_ methods do not mutate the Experiment state.

This allows us to verify execution plans before running a final phase sequence without worrying about accidentally pre-training the ModelGraph.

# Verify the plan by running the experiment once before starting CV
exp.preview_run()
print("Single experiment run completed.")

CVBinding#

A CVBinding tells CrossValidation which FeatureSet to fold over and which existing splits form the CV pool.

    CVBinding(
        fs: str | FeatureSet,
        source_splits: list[str],
        *,
        group_by: str | list[str] | None = None,
        stratify_by: str | list[str] | None = None,
        train_split_name: str = "train",
        val_split_name: str = "val",
        val_size: float | None = None,
    )

Parameter

Type

Default

Description

fs

str | FeatureSet

(required)

The FeatureSet to fold.

source_splits

list[str]

(required)

Existing splits to pool before folding.

group_by

str | list[str] | None

None

Keep groups together across fold boundaries.

stratify_by

str | list[str] | None

None

Balance strata across folds (mutually exclusive with group_by).

train_split_name

str

"train"

The split name that receives each fold’s training data.

val_split_name

str

"val"

The split name that receives each fold’s validation data.

val_size

float | None

None

Explicit validation proportion per fold. If None, uses 1 / n_folds.

How folding works#

source_splits specifies which existing splits are pooled into the CV data. In our case source_splits=["train", "val"] combines the train and val samples into one pool. This pool is then split into n_folds equal pieces. Each fold uses one piece as validation and the remainder as training, replacing the train and val splits in the FeatureSet for that fold’s execution.

Note that we could just use the source split as our pool, as it is union of train and val samples. We use a distinct list of splits to show that any views can be merged into the CV pool, they do not need to originate from the same parent view (but they do need to belong to the same FeatureSet).

The test split is not included in source_splits, so it remains unchanged across all folds.

from modularml import CrossValidation, CVBinding

cv = CrossValidation(
    bindings=CVBinding(
        fs=fs,
        source_splits=["train", "val"],
        group_by="sensor_id",  # keep all readings from a sensor in the same fold
    ),
    n_folds=5,
    seed=13,
    experiment=exp,
)
print(f"CrossValidation: {cv.n_folds} folds")
print(f"Phase template:  {[e.label for e in cv.phase_template.all]}")

Running Cross-Validation#

Call cv.run() to execute all folds. For each fold, CrossValidation:

  1. Partitions the pooled source data into n_folds non-overlapping pieces.

  2. Creates a temporary context where train = all-but-one piece, val = the held-out piece, and test remains unchanged.

  3. Runs the full execution plan inside the temporary context.

  4. Restores the original context (the original FeatureSet and Experiment are identical after CV as before CV).

cv.run() returns a CVResults object containing one PhaseGroupResults per fold.

cv_res = cv.run()
print(cv_res)
print(f"Fold labels: {cv_res.fold_labels}")

Accessing Results#

Per-fold results#

CVResults extends PhaseGroupResults. Each fold’s results are accessed with get_fold(i) (by index) or get_fold("fold_i") (by label).

# Access the first fold
fold_0 = cv_res.get_fold(0)
print(f"Fold 0 results: {fold_0}")

# Training results for fold 0
train_res_0 = fold_0.get_train_result("train")
print(f"  train results: {train_res_0}")

# Final eval results for fold 0
eval_res_0 = fold_0.get_eval_result("eval")
print(f"  eval results:  {eval_res_0}")

Validation loss tracked during training#

The EvalLossMetric inside the Evaluation callback logged val_loss to the MetricStore each epoch. Access it via TrainResults.metrics.

for fold_label in cv_res.fold_labels:
    fold = cv_res.get_fold(fold_label)
    train_res = fold.get_train_result("train")
    val_loss = train_res.metrics().where(name="val_loss").last(sort_by="epoch").value
    print(f"{fold_label}: final val_loss = {val_loss:.4f}")

Cross-fold training losses#

CVResults.losses() collects training losses across all folds and returns an AxisSeries keyed by (fold, epoch, batch, label). Use .where(), .collapse(), and .at() from the AxisSeries API to filter and aggregate.

# Training losses over all folds and epochs
train_losses = cv_res.losses(node="MLP", phase="train")
print(f"Axes: {train_losses.axes}")

# Mean across batches, then across folds
mean_by_epoch = (
    train_losses
    .collapse(axis="batch", reducer="mean")
    .collapse(axis="fold", reducer="mean")
    .squeeze()
)
print("Mean train loss per epoch (averaged across batches and folds):")
for epoch, loss_record in mean_by_epoch.items():
    print(f"  epoch {epoch}: {loss_record.trainable:.4f}")

Custom fold extraction with collect()#

CVResults.collect() applies an arbitrary extractor to each fold, returning an AxisSeries with a fold axis prepended.

# Collect the final-epoch val_loss scalar from each fold
final_val_losses = cv_res.collect(
    lambda fold: (
        fold.get_train_result("train")
        .metrics()
        .where(name="val_loss")
        .last(sort_by="epoch")
    ),
)
print("Final val_loss per fold:")
for fold_label, metric_entry in final_val_losses.items():
    print(f"  {fold_label}: {metric_entry.value:.4f}")

Summary#

CrossValidation Constructor#

Parameter

Type

Default

Description

bindings

CVBinding | list[CVBinding]

(required)

Fold configurations per FeatureSet.

n_folds

int

5

Number of folds.

seed

int

13

Random seed for fold generation.

label

str

"CV"

Label applied to generated fold groups.

phase

TrainPhase | PhaseGroup | None

None

Phase to run per fold. If None, uses the experiment’s execution plan.

experiment

Experiment | None

None

Experiment to execute. Defaults to the active experiment.

CVBinding Constructor#

Parameter

Type

Default

Description

fs

str | FeatureSet

(required)

FeatureSet to fold over.

source_splits

list[str]

(required)

Splits pooled into the CV data.

group_by

str | list[str] | None

None

Tag column(s) for group-based folding.

stratify_by

str | list[str] | None

None

Tag column(s) for stratified folding.

train_split_name

str

"train"

Split name replaced with fold training data.

val_split_name

str

"val"

Split name replaced with fold validation data.

val_size

float | None

None

Explicit validation size per fold (1/n_folds if None).

CVResults API#

Method / Property

Returns

Description

n_folds

int

Number of completed folds.

fold_labels

list[str]

Fold labels in execution order.

get_fold(fold)

PhaseGroupResults

Results for a specific fold (by index or label).

losses(node, phase)

AxisSeries[(fold, epoch, batch, label)]

Training losses across all folds.

collect(extractor)

AxisSeries

Apply a function to each fold; merge results with fold axis prepended.

Data Flow During Cross-Validation#

FeatureSet (unchanged after CV completes)
  ├─ source (pooled into CV)
  │     ├─ train  <-- replaced with fold training data
  │     └─ val    <-- replaced with fold validation data
  └─ test         <-- unchanged in all folds (not in source_splits)

Each fold creates a temporary context where train and val are swapped out. The original FeatureSet is never mutated.