How to: Use Cross-Validation#
Cross-validation (CV) evaluates how well a model generalizes by repeatedly training
and validating on rotating subsets of data. ModularML provides CrossValidation
and CVBinding to integrate CV directly with the Experiment API.
Prerequisites: This notebook uses
EvaluationandEvalLossMetriccallbacks. Read How to: Use Callbacks first if you are not familiar with them.
This notebook covers:
%matplotlib inline
import numpy as np
from modularml import (
AppliedLoss,
EvalPhase,
Experiment,
FeatureSet,
Loss,
ModelGraph,
ModelNode,
Optimizer,
TrainPhase,
)
from modularml.models.torch import SequentialMLP
from modularml.samplers import SimpleSampler
Dataset Setup#
We create a synthetic dataset that mimics a battery health monitoring scenario:
50 sensors, each providing 10 voltage readings, with a scalar state-of-health
target (soh). A sensor_id tag column identifies which sensor each sample
belongs to.
We split the data into two stages:
Source / Test split: separate sensors held out for final testing from those used for cross-validation. We keep sensor groups intact (
group_by="sensor_id").Train / Val split within source: randomly divide source sensors into
trainandvalsplits. These are the splits that will rotate during CV.
rng = np.random.default_rng(13)
n_sensors = 50
n_readings_per_sensor = 10
n_samples = n_sensors * n_readings_per_sensor # 500 total
# sensor_id repeats for each reading within a sensor
sensor_ids = np.repeat(np.arange(n_sensors), n_readings_per_sensor).astype(str)
fs = FeatureSet.from_dict(
label="SensorData",
data={
"voltage": list(rng.standard_normal((n_samples, 50))),
"soh": list(rng.standard_normal((n_samples, 1))),
"sensor_id": list(sensor_ids),
},
feature_keys="voltage",
target_keys="soh",
tag_keys="sensor_id",
)
print(f"Total samples: {len(fs)}")
print(f"Tags: {fs.get_tag_keys()}")
fs.clear_splits()
# Stage 1: split by sensor_id - keeps all readings from a sensor together
# source = 40 sensors (80%), test = 10 sensors (20%)
fs.split_random(
ratios={"source": 0.8, "test": 0.2},
group_by="sensor_id",
seed=13,
)
# Stage 2: randomly split source readings into train and val
fs.get_split("source").split_random(
ratios={"train": 0.7, "val": 0.3},
seed=42,
)
fs.visualize()
Model and Experiment#
We set up a simple MLP model graph and create an experiment. The setup is identical to the approach in How to: Create and Use an Experiment.
fs_ref = fs.reference(features="voltage", targets="soh")
mn_mlp = ModelNode(
label="MLP",
model=SequentialMLP(output_shape=(1, 1), n_layers=2, hidden_dim=16),
upstream_ref=fs_ref,
)
graph = ModelGraph(
label="SimpleGraph",
nodes=[mn_mlp],
optimizer=Optimizer("adam", opt_kwargs={"lr": 1e-3}, backend="torch"),
)
graph.build()
graph.visualize()
exp = Experiment.from_active_context(label="my_experiment")
Execution Plan#
We define the execution plan that will run inside each fold of the cross-validation. This plan consists of:
A
TrainPhaseon thetrainsplit, with anEvaluationcallback that monitors validation loss on thevalsplit after every epoch.A final
EvalPhaseon the held-outtestsplit.
from modularml.callbacks import EvalLossMetric, Evaluation
mse_loss = AppliedLoss(
loss=Loss("mse", backend="torch"),
on="MLP",
inputs=["outputs", "targets"],
)
# Evaluation callback: run on val split after every epoch
eval_cb = Evaluation.from_split(
label="eval_val",
split="val",
every_n_epochs=1,
metrics=[
EvalLossMetric(
name="val_loss",
loss=AppliedLoss(
loss=Loss("mse", backend="torch"),
on="MLP",
inputs=["targets", "outputs"],
),
),
],
)
train_phase = TrainPhase.from_split(
label="train",
split="train",
sampler=SimpleSampler(batch_size=4, shuffle=True, seed=42),
losses=[mse_loss],
n_epochs=2,
callbacks=[eval_cb],
)
# Final eval on held-out test split
eval_phase = EvalPhase.from_split(
label="eval",
split="test",
losses=[mse_loss],
)
exp.execution_plan.add_phase(train_phase)
exp.execution_plan.add_phase(eval_phase)
We can verify this plan before running cross-validation with preview_run.
Unlike the run_ methods, preview_ methods do not mutate the Experiment state.
This allows us to verify execution plans before running a final phase sequence without worrying about accidentally pre-training the ModelGraph.
# Verify the plan by running the experiment once before starting CV
exp.preview_run()
print("Single experiment run completed.")
CVBinding#
A CVBinding tells CrossValidation which FeatureSet to fold over and which
existing splits form the CV pool.
CVBinding(
fs: str | FeatureSet,
source_splits: list[str],
*,
group_by: str | list[str] | None = None,
stratify_by: str | list[str] | None = None,
train_split_name: str = "train",
val_split_name: str = "val",
val_size: float | None = None,
)
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
(required) |
The |
|
|
(required) |
Existing splits to pool before folding. |
|
|
|
Keep groups together across fold boundaries. |
|
|
|
Balance strata across folds (mutually exclusive with |
|
|
|
The split name that receives each fold’s training data. |
|
|
|
The split name that receives each fold’s validation data. |
|
|
|
Explicit validation proportion per fold. If |
How folding works#
source_splits specifies which existing splits are pooled into the CV data.
In our case source_splits=["train", "val"] combines the train and val samples
into one pool. This pool is then split into n_folds equal pieces. Each fold
uses one piece as validation and the remainder as training, replacing the
train and val splits in the FeatureSet for that fold’s execution.
Note that we could just use the source split as our pool, as it is union of train and val samples. We use a distinct list of splits to show that any views can be merged into the CV pool, they do not need to originate from the same parent view (but they do need to belong to the same FeatureSet).
The test split is not included in source_splits, so it remains unchanged
across all folds.
from modularml import CrossValidation, CVBinding
cv = CrossValidation(
bindings=CVBinding(
fs=fs,
source_splits=["train", "val"],
group_by="sensor_id", # keep all readings from a sensor in the same fold
),
n_folds=5,
seed=13,
experiment=exp,
)
print(f"CrossValidation: {cv.n_folds} folds")
print(f"Phase template: {[e.label for e in cv.phase_template.all]}")
Running Cross-Validation#
Call cv.run() to execute all folds. For each fold, CrossValidation:
Partitions the pooled source data into
n_foldsnon-overlapping pieces.Creates a temporary context where
train= all-but-one piece,val= the held-out piece, andtestremains unchanged.Runs the full execution plan inside the temporary context.
Restores the original context (the original
FeatureSetandExperimentare identical after CV as before CV).
cv.run() returns a CVResults object containing one PhaseGroupResults
per fold.
cv_res = cv.run()
print(cv_res)
print(f"Fold labels: {cv_res.fold_labels}")
Accessing Results#
Per-fold results#
CVResults extends PhaseGroupResults. Each fold’s results are accessed with
get_fold(i) (by index) or get_fold("fold_i") (by label).
# Access the first fold
fold_0 = cv_res.get_fold(0)
print(f"Fold 0 results: {fold_0}")
# Training results for fold 0
train_res_0 = fold_0.get_train_result("train")
print(f" train results: {train_res_0}")
# Final eval results for fold 0
eval_res_0 = fold_0.get_eval_result("eval")
print(f" eval results: {eval_res_0}")
Validation loss tracked during training#
The EvalLossMetric inside the Evaluation callback logged val_loss to
the MetricStore each epoch. Access it via TrainResults.metrics.
for fold_label in cv_res.fold_labels:
fold = cv_res.get_fold(fold_label)
train_res = fold.get_train_result("train")
val_loss = train_res.metrics().where(name="val_loss").last(sort_by="epoch").value
print(f"{fold_label}: final val_loss = {val_loss:.4f}")
Cross-fold training losses#
CVResults.losses() collects training losses across all folds and returns
an AxisSeries keyed by (fold, epoch, batch, label). Use .where(), .collapse(), and
.at() from the AxisSeries API to filter and aggregate.
# Training losses over all folds and epochs
train_losses = cv_res.losses(node="MLP", phase="train")
print(f"Axes: {train_losses.axes}")
# Mean across batches, then across folds
mean_by_epoch = (
train_losses
.collapse(axis="batch", reducer="mean")
.collapse(axis="fold", reducer="mean")
.squeeze()
)
print("Mean train loss per epoch (averaged across batches and folds):")
for epoch, loss_record in mean_by_epoch.items():
print(f" epoch {epoch}: {loss_record.trainable:.4f}")
Custom fold extraction with collect()#
CVResults.collect() applies an arbitrary extractor to each fold, returning
an AxisSeries with a fold axis prepended.
# Collect the final-epoch val_loss scalar from each fold
final_val_losses = cv_res.collect(
lambda fold: (
fold.get_train_result("train")
.metrics()
.where(name="val_loss")
.last(sort_by="epoch")
),
)
print("Final val_loss per fold:")
for fold_label, metric_entry in final_val_losses.items():
print(f" {fold_label}: {metric_entry.value:.4f}")
Summary#
CrossValidation Constructor#
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
(required) |
Fold configurations per |
|
|
|
Number of folds. |
|
|
|
Random seed for fold generation. |
|
|
|
Label applied to generated fold groups. |
|
|
|
Phase to run per fold. If |
|
|
|
Experiment to execute. Defaults to the active experiment. |
CVBinding Constructor#
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
(required) |
|
|
|
(required) |
Splits pooled into the CV data. |
|
|
|
Tag column(s) for group-based folding. |
|
|
|
Tag column(s) for stratified folding. |
|
|
|
Split name replaced with fold training data. |
|
|
|
Split name replaced with fold validation data. |
|
|
|
Explicit validation size per fold ( |
CVResults API#
Method / Property |
Returns |
Description |
|---|---|---|
|
|
Number of completed folds. |
|
|
Fold labels in execution order. |
|
|
Results for a specific fold (by index or label). |
|
|
Training losses across all folds. |
|
|
Apply a function to each fold; merge results with |
Data Flow During Cross-Validation#
FeatureSet (unchanged after CV completes)
├─ source (pooled into CV)
│ ├─ train <-- replaced with fold training data
│ └─ val <-- replaced with fold validation data
└─ test <-- unchanged in all folds (not in source_splits)
Each fold creates a temporary context where train and val are swapped out.
The original FeatureSet is never mutated.