How to: Create and Use an Experiment#

An Experiment is the top-level orchestrator in ModularML. It coordinates:

  • Phases — units of work such as training (TrainPhase), evaluation (EvalPhase), or batch fitting (FitPhase)

  • Phase Groups — named collections of phases that execute in order

  • Callbacks — hooks at phase, group, and experiment boundaries

  • Checkpointing — automatic saving and restoring of experiment state

  • Execution History — records of every run for reproducibility

Note: This notebook covers the Experiment API and how phases are registered, organized, and executed. Phase-specific details (configuration, advanced usage) are covered in dedicated notebooks: $\textcolor{red}{\text{…to be added soon}}$

This notebook covers:

import numpy as np

from modularml import (
    AppliedLoss,
    EvalPhase,
    Experiment,
    FeatureSet,
    InputBinding,
    Loss,
    ModelGraph,
    ModelNode,
    Optimizer,
    TrainPhase,
)
from modularml.core.experiment.phases.phase_group import PhaseGroup
from modularml.samplers import SimpleSampler
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 3
      1 import numpy as np
----> 3 from modularml import (
      4     AppliedLoss,
      5     EvalPhase,
      6     Experiment,
      7     FeatureSet,
      8     InputBinding,
      9     Loss,
     10     ModelGraph,
     11     ModelNode,
     12     Optimizer,
     13     TrainPhase,
     14 )
     15 from modularml.core.experiment.phases.phase_group import PhaseGroup
     16 from modularml.samplers import SimpleSampler

File ~/checkouts/readthedocs.org/user_builds/modular-ml/envs/stable/lib/python3.10/site-packages/modularml/__init__.py:1
----> 1 from modularml.api import (
      2     AppliedLoss,
      3     BaseModel,
      4     Checkpointing,
      5     ConcatNode,
      6     CrossValidation,
      7     CVBinding,
      8     EarlyStopping,
      9     EvalLossMetric,
     10     EvalPhase,
     11     EvalResults,
     12     Experiment,
     13     ExperimentContext,
     14     FeatureSet,
     15     FeatureSetView,
     16     FitPhase,
     17     FitResults,
     18     InputBinding,
     19     Loss,
     20     ModelGraph,
     21     ModelNode,
     22     Optimizer,
     23     PhaseGroup,
     24     PhaseGroupResults,
     25     ResultRecording,
     26     Scaler,
     27     SimilarityCondition,
     28     TensorflowBaseModel,
     29     TorchBaseModel,
     30     TrainPhase,
     31     TrainResults,
     32     supported_scalers,
     33 )
     34 from modularml.registry import register_all
     36 register_all()

File ~/checkouts/readthedocs.org/user_builds/modular-ml/envs/stable/lib/python3.10/site-packages/modularml/api.py:4
      1 # ================================================
      2 # Experiment & Phases
      3 # ================================================
----> 4 from modularml.core.experiment.experiment import Experiment
      5 from modularml.core.experiment.experiment_context import ExperimentContext
      6 from modularml.core.experiment.phases.phase_group import PhaseGroup

File ~/checkouts/readthedocs.org/user_builds/modular-ml/envs/stable/lib/python3.10/site-packages/modularml/core/experiment/experiment.py:20
     11 from modularml.core.experiment.callbacks.experiment_callback import (
     12     ExperimentCallback,
     13 )
     14 from modularml.core.experiment.checkpointing import (
     15     EXPERIMENT_HOOKS,
     16     EXPERIMENT_NAME_TEMPLATE,
     17     EXPERIMENT_PLACEHOLDERS,
     18     Checkpointing,
     19 )
---> 20 from modularml.core.experiment.experiment_context import (
     21     ExperimentContext,
     22     RegistrationPolicy,
     23 )
     24 from modularml.core.experiment.phases.eval_phase import EvalPhase
     25 from modularml.core.experiment.phases.fit_phase import FitPhase

File ~/checkouts/readthedocs.org/user_builds/modular-ml/envs/stable/lib/python3.10/site-packages/modularml/core/experiment/experiment_context.py:11
      8 from typing import TYPE_CHECKING, Any
      9 from weakref import ref
---> 11 from matplotlib.pylab import Enum
     13 from modularml.utils.environment.environment import IN_NOTEBOOK
     14 from modularml.utils.logging.warnings import warn

ModuleNotFoundError: No module named 'matplotlib'

Creating an Experiment#

An Experiment is created with a label and an optional registration_policy that controls how duplicate node names are handled.

    Experiment(
        label: str,
        registration_policy: str | None = None,
        ctx: ExperimentContext | None = None,
        checkpointing: Checkpointing | None = None,
        callbacks: list[ExperimentCallback] | None = None,
    )

Parameter

Type

Default

Description

label

str

(required)

Name for this experiment.

registration_policy

str | None

None

How to handle duplicate node labels: "raise", "overwrite", or "rename".

ctx

ExperimentContext | None

None

Context to associate with. If None, a new context is created.

checkpointing

Checkpointing | None

None

Experiment-level checkpointing configuration.

callbacks

list[ExperimentCallback] | None

None

Experiment-level callbacks for phase/group boundaries.

exp = Experiment(label="my_experiment", registration_policy="overwrite")
print(f"Experiment: {exp.label}")
print(f"Context:    {exp.ctx}")

Registration Policy#

The registration_policy determines what happens when two nodes share the same label. This is primarily useful in notebook environments where cells may be re-executed.

Policy

Behavior

"raise"

Raises an error on duplicate labels (default).

"overwrite"

Silently replaces the existing node.

"rename"

Assigns a unique suffix to the new node’s label.

Creating from an Active Context#

If nodes have already been registered in the current ExperimentContext, you can bind a new Experiment to that existing context with from_active_context(). This retains all previously registered nodes.

    exp = Experiment.from_active_context(
        label="my_experiment",
        registration_policy="overwrite",
    )

Setting Up a Model Graph#

Before defining phases, we need a ModelGraph with at least one ModelNode and a FeatureSet to supply data. The Experiment automatically tracks the ModelGraph registered in its context.

For details on creating model graphs, see How to: Create and Use a ModelGraph.

# Create synthetic data
rng = np.random.default_rng(42)

fs = FeatureSet.from_dict(
    label="SensorData",
    data={
        "voltage": list(rng.standard_normal((500, 10))),
        "soh": list(rng.standard_normal((500, 1))),
    },
    feature_keys="voltage",
    target_keys="soh",
)

# Create a train/test split
fs.split_random(
    ratios={
        "train": 0.8,
        "test": 0.2,
    },
    seed=13,
)
print(fs)
print(f"Splits: {fs.available_splits}")
fs.visualize()
from modularml.models.torch import SequentialMLP

# Reference defining which columns feed into the model
fs_ref = fs.reference(features="voltage", targets="soh")

# Create model node
node = ModelNode(
    label="MLP",
    model=SequentialMLP(output_shape=(1, 1), n_layers=2, hidden_dim=32),
    upstream_ref=fs_ref,
)

# Create model graph with a global optimizer
graph = ModelGraph(
    label="SimpleGraph",
    nodes=[node],
    optimizer=Optimizer("adam", opt_kwargs={"lr": 1e-3}, backend="torch"),
)

# Build the graph (infers shapes)
graph.build()
graph.visualize()

print(f"Experiment model_graph: {exp.model_graph}")

Defining Phases#

Phases are the executable units of an Experiment. Each phase type handles a different style of model execution:

Phase

Purpose

Key Concept

TrainPhase

Mini-batch gradient training

Requires a Sampler and Loss

EvalPhase

Forward-only evaluation

No sampler; runs on full split

FitPhase

Batch fitting (e.g., scikit-learn)

Entire dataset passed at once

All phases require input bindings that connect FeatureSet data to head GraphNodes in the model graph.

Input Bindings#

An InputBinding defines how data flows from a FeatureSet into a head GraphNode during a specific phase. There are two constructors:

  • InputBinding.for_training(...) — requires a Sampler to generate batches

  • InputBinding.for_evaluation(...) — passes data directly (no sampler)

Parameter

for_training

for_evaluation

node

required

required

sampler

required

upstream

required*

required*

split

optional

optional

* Can be None if the node has exactly one upstream FeatureSet.

# Training binding: requires a sampler
train_binding = InputBinding.for_training(
    node=node,
    sampler=SimpleSampler(batch_size=32, shuffle=True, seed=42),
    upstream=None,  # auto-resolved (node has one upstream FeatureSet)
    split="train",
)
print(f"Train binding node: {train_binding.node_id[:8]}...")
print(f"Train binding split: {train_binding.split}")
# Evaluation binding: no sampler needed
eval_binding = InputBinding.for_evaluation(
    node=node,
    upstream=None,
    split="test",
)
print(f"Eval binding split: {eval_binding.split}")

Defining a Loss#

Training phases require at least one AppliedLoss, which binds a Loss function to a specific ModelNode and specifies what inputs the loss receives.

    AppliedLoss(
        loss: Loss,
        on: str | ModelNode,
        inputs: list[str] | dict[str, str],
        weight: float = 1.0,
        label: str | None = None,
    )

The inputs argument uses string references to resolve data at runtime:

  • "outputs" — the model node’s predictions

  • "targets" — the target data passed through the model node

mse_loss = AppliedLoss(
    loss=Loss("mse", backend="torch"),
    on=node,
    inputs=["outputs", "targets"],
)
print(f"Loss: {mse_loss.label}")
print(f"Applied on: {mse_loss.node_id[:8]}...")

Creating a TrainPhase#

A TrainPhase performs mini-batch gradient training over one or more epochs.

There are two ways to create a TrainPhase:

  1. Default constructor — provide InputBindings explicitly

  2. from_split() convenience — auto-generates bindings from a split name

# Option A: Using explicit InputBindings
train_phase = TrainPhase(
    label="train",
    input_sources=[train_binding],
    losses=[mse_loss],
    n_epochs=3,
)
print(f"TrainPhase: {train_phase.label}")
print(f"  n_epochs: {train_phase.n_epochs}")
print(f"  losses:   {[ls.label for ls in train_phase.losses]}")

train_phase.visualize()
# Option B: Using the from_split() convenience constructor
# This auto-generates InputBindings for all active head nodes
train_phase_b = TrainPhase.from_split(
    label="train_from_split",
    split="train",
    sampler=SimpleSampler(batch_size=32, shuffle=True, seed=42),
    losses=[mse_loss],
    n_epochs=3,
)
print(f"TrainPhase (from_split): {train_phase_b.label}")

train_phase.visualize()

Creating an EvalPhase#

An EvalPhase runs a forward pass over a FeatureSet split without any gradient computation. All graph nodes are automatically frozen during evaluation.

# Using the from_split() convenience constructor
eval_phase = EvalPhase.from_split(
    label="eval",
    split="test",
    losses=[mse_loss],
)
print(f"EvalPhase: {eval_phase.label}")

eval_phase.visualize()

Creating a FitPhase#

A FitPhase fits batch-fit models (like scikit-learn estimators) on the entire dataset at once. It has no epochs or sampling. By default, fitted nodes are frozen after fitting.

    fit_phase = FitPhase.from_split(
        label="fit_rf",
        split="train",
        freeze_after_fit=True,  # default
    )

Note: FitPhase is only relevant when your ModelGraph contains scikit-learn (batch-fit) model nodes. We will not use it in the running examples below since our graph uses PyTorch models.


The Execution Plan#

Every Experiment has an execution_plan property — a PhaseGroup that defines the order in which phases execute when you call experiment.run().

Phases are added with add_phase() and execute in the order they are registered.

# Access the execution plan
plan = exp.execution_plan
print(f"Execution plan: {plan}")
print(f"Currently empty: {len(plan.all) == 0}")
# Register phases in execution order
plan.add_phase(train_phase)
plan.add_phase(eval_phase)

print(f"Plan entries: {len(plan.all)}")
for i, entry in enumerate(plan.all):
    print(f"  [{i}] {entry.label} ({type(entry).__name__})")

Accessing Phases#

Phases can be accessed by position (index) or by label.

# By index
first_phase = plan[0]
print(f"By index:  {first_phase.label}")

# By label
train_ref = plan["train"]
print(f"By label:  {train_ref.label}")

# Type-safe accessors
tp = plan.get_train_phase("train")
ep = plan.get_eval_phase("eval")
print(f"TrainPhase: {tp.label}, EvalPhase: {ep.label}")

Removing Phases#

Phases can be removed by index, label, or instance.

# Remove by label
plan.remove_phase("eval")
print(f"After remove: {[e.label for e in plan.all]}")

# Re-add for later examples
plan.add_phase(eval_phase)
print(f"After re-add: {[e.label for e in plan.all]}")

Convenience Methods#

The execution plan also provides convenience methods to construct and register phases in a single call:

    plan.add_train_phase(
        label="train",
        input_sources=[...],
        losses=[...],
        n_epochs=5,
    )

    plan.add_eval_phase(
        label="eval",
        input_sources=[...],
        losses=[...],
    )

Aliases add_train(), add_training(), add_eval(), and add_evaluation() are also available.


Running Phases#

Phases can be run individually with run_phase(), regardless of whether they are registered on the execution plan. Each run mutates experiment state and records an entry in history.

# Run the training phase
train_results = exp.run_phase(train_phase)
print("Training completed.")
print(f"  History entries: {len(exp.history)}")
# Run the evaluation phase
eval_results = exp.run_phase(eval_phase)
print("Evaluation completed.")
print(f"  History entries: {len(exp.history)}")

Display Options#

Each phase type accepts display-related keyword arguments to control progress bars:

TrainPhase:

Parameter

Default

Description

show_sampler_progress

True

Show progress for batch creation

show_training_progress

True

Show epoch-level progress bar

persist_progress

IN_NOTEBOOK

Keep progress bars visible after completion

persist_epoch_progress

IN_NOTEBOOK

Keep per-epoch bars visible

EvalPhase:

Parameter

Default

Description

show_eval_progress

False

Show evaluation progress bar

persist_progress

IN_NOTEBOOK

Keep progress bars visible after completion


Running the Full Execution Plan#

Calling experiment.run() executes all phases registered on the execution plan, in the order they were added. This is the primary entry point for running a complete experiment.

# Run the full execution plan (train -> eval)
results = exp.run()
print("Full run completed.")
print(f"  History entries: {len(exp.history)}")

run() returns a PhaseGroupResults object that contains results from all executed phases. Individual phase results can be accessed by label.

# Inspect results
print(f"Result type: {type(results).__name__}")
print(f"Contained results: {results.flatten()}")

Preview Mode#

Sometimes you want to evaluate a phase without permanently changing experiment state. The preview_phase() and preview_group() methods do exactly this:

  1. Capture the current experiment state

  2. Execute the phase/group

  3. Restore the original state

Preview runs are not recorded in history, and checkpointing is disabled.

history_before = len(exp.history)

# Preview does not mutate state
preview_res = exp.preview_phase(eval_phase)

history_after = len(exp.history)
print(f"History before: {history_before}")
print(f"History after:  {history_after}")
print(f"State was restored: {history_before == history_after}")

Execution History#

Every call to run_phase(), run_group(), or run() records an ExperimentRun in experiment.history. Each run captures:

  • Label, start/end timestamps, and status

  • Phase results (losses, outputs, etc.)

  • Execution metadata (timing per phase)

for i, run in enumerate(exp.history):
    print(
        f"  Run {i}: label={run.label!r}, "
        f"status={run.status}, "
        f"duration={run.ended_at - run.started_at}",
    )
# Access the most recent run
last = exp.last_run
print(f"Last run: {last.label}")
print(f"  Status:  {last.status}")
print(f"  Results: {type(last.results).__name__}")

Phase Groups#

A PhaseGroup is a named collection that organizes phases into logical blocks. Phase groups can be nested (a group can contain other groups), enabling hierarchical experiment structures.

The experiment’s execution_plan is itself a PhaseGroup.

# Create a sub-group for a train-eval cycle
cycle = PhaseGroup(label="train_eval_cycle")

cycle.add_phase(
    TrainPhase.from_split(
        label="cycle_train",
        split="train",
        sampler=SimpleSampler(batch_size=32, shuffle=True, seed=42),
        losses=[mse_loss],
        n_epochs=2,
    ),
)
cycle.add_phase(
    EvalPhase.from_split(
        label="cycle_eval",
        split="test",
        losses=[mse_loss],
    ),
)

print(f"Group: {cycle}")
print(f"Entries: {[e.label for e in cycle.all]}")
# Run the group directly
group_results = exp.run_group(cycle)
print(f"Group results: {group_results.flatten()}")

Nesting Groups#

Groups can be nested within the execution plan or within other groups. Use add_group() to nest a PhaseGroup inside another.

# Build a nested plan
outer = PhaseGroup(label="outer")

inner = PhaseGroup(label="inner")
inner.add_phase(
    TrainPhase.from_split(
        label="inner_train",
        split="train",
        sampler=SimpleSampler(batch_size=64, shuffle=True, seed=0),
        losses=[mse_loss],
        n_epochs=1,
    ),
)

outer.add_group(inner)
outer.add_phase(
    EvalPhase.from_split(
        label="outer_eval",
        split="test",
        losses=[mse_loss],
    ),
)

# flatten() unrolls all nested groups into execution order
print(f"Flattened: {[p.label for p in outer.flatten()]}")

PhaseGroup API#

Method

Description

add_phase(phase)

Register a phase.

add_group(group)

Register a nested group.

add_train_phase(...)

Construct and register a TrainPhase.

add_eval_phase(...)

Construct and register an EvalPhase.

remove_phase(key)

Remove a phase by index, label, or instance.

remove_group(key)

Remove a group by index, label, or instance.

clear()

Remove all entries.

flatten()

Unroll all nested groups into a flat list of phases.

get_phase(key)

Get a phase by index or label.

get_train_phase(key)

Get a TrainPhase by index or label.

get_eval_phase(key)

Get an EvalPhase by index or label.

get_group(key)

Get a nested PhaseGroup by index or label.

items()

Iterate over (label, entry) pairs.


Experiment Callbacks#

Experiment-level callbacks (ExperimentCallback) fire at phase and group boundaries during run(). They are distinct from phase-level Callbacks that fire at batch/epoch boundaries within a single phase.

Hook

Trigger

on_experiment_start(experiment)

Before the execution plan begins

on_experiment_end(experiment)

After the execution plan completes

on_phase_start(experiment, phase)

Before each phase executes

on_phase_end(experiment, phase)

After each phase completes

on_group_start(experiment, group)

Before each group executes

on_group_end(experiment, group)

After each group completes

on_exception(experiment, phase, exception)

On unhandled exception

Callbacks are registered via the constructor or add_callback():

    exp = Experiment(
        label="my_exp",
        callbacks=[my_callback],
    )

    # Or add later
    exp.add_callback(another_callback)

Checkpointing#

Experiment-level checkpointing automatically saves the full experiment state to disk at configurable lifecycle hooks. This is useful for fault tolerance and resumption.

Experiment checkpointing only supports mode="disk" (in-memory snapshots of the full experiment state would be too large).

Configuring Checkpointing#

Checkpointing is configured via the Checkpointing class and passed at construction time or via set_checkpointing().

Valid save_on hooks for experiment-level checkpointing:

Hook

When

"phase_start"

Before each phase

"phase_end"

After each phase

"group_start"

Before each group

"group_end"

After each group

"experiment_start"

Before run() begins

"experiment_end"

After run() completes

    from modularml import Checkpointing

    exp = Experiment(
        label="checkpointed_exp",
        checkpointing=Checkpointing(
            mode="disk",
            save_on=["phase_end"],
            directory="./checkpoints",
        ),
    )

Manual Checkpointing#

You can also save and restore checkpoints manually.

from pathlib import Path
from tempfile import TemporaryDirectory

CKPT_DIR = TemporaryDirectory()

# Set the checkpoint directory
exp.set_checkpoint_dir(Path(CKPT_DIR.name))

# Save a checkpoint
ckpt_path = exp.save_checkpoint("after_training", overwrite=True)
print(f"Checkpoint saved to: {ckpt_path}")
print(f"Available checkpoints: {list(exp.available_checkpoints.keys())}")
# Restore from a checkpoint (by name or path)
exp.restore_checkpoint("after_training")
print("Checkpoint restored.")

Disabling Checkpointing#

Use the disable_checkpointing() context manager to temporarily suppress all checkpointing (both experiment-level and TrainPhase-level).

    with exp.disable_checkpointing():
        exp.run_phase(train_phase)  # No checkpoints saved

Serialization#

An Experiment can be fully serialized to disk via save() and reloaded with load(). This includes the model graph state, execution plan, and execution history.

SAVE_DIR = TemporaryDirectory()

# Save the experiment
save_path = exp.save(Path(SAVE_DIR.name) / "my_experiment", overwrite=True)
print(f"Experiment saved to: {save_path}")
# Load the experiment
loaded_exp = Experiment.load(save_path, overwrite=True)
print(f"Loaded experiment: {loaded_exp.label}")
print(f"  Model graph: {loaded_exp.model_graph}")

The get_config() and get_state() methods provide lower-level access to the experiment’s structure and mutable state for custom serialization workflows.

    config = exp.get_config()   # Structure (label, plan, policy)
    state = exp.get_state()     # Mutable state (context, history, checkpoints)

    # Restore
    exp.set_state(state)

Summary#

Experiment Constructor#

Parameter

Type

Default

Description

label

str

(required)

Name for this experiment.

registration_policy

str | None

None

"raise", "overwrite", or "rename".

ctx

ExperimentContext | None

None

Context to bind to.

checkpointing

Checkpointing | None

None

Auto-checkpoint configuration.

callbacks

list[ExperimentCallback] | None

None

Experiment-level callbacks.

Experiment Properties#

Property

Type

Description

ctx

ExperimentContext

The associated context.

model_graph

ModelGraph | None

The registered model graph.

execution_plan

PhaseGroup

Phases to run on run().

history

list[ExperimentRun]

All completed runs.

last_run

ExperimentRun | None

Most recent run.

checkpointing

Checkpointing | None

Checkpoint configuration.

available_checkpoints

dict[str, Path]

Saved checkpoint registry.

exp_callbacks

list[ExperimentCallback]

Registered callbacks.

Experiment Methods#

Method

Description

run()

Execute the full execution plan.

run_phase(phase)

Execute a single phase (records history).

run_group(group)

Execute a phase group (records history).

preview_phase(phase)

Execute a phase without mutating state.

preview_group(group)

Execute a group without mutating state.

add_callback(cb)

Register an experiment-level callback.

set_checkpointing(ckpt)

Attach/replace checkpointing configuration.

set_checkpoint_dir(path)

Set the checkpoint save directory.

save_checkpoint(name)

Manually save a checkpoint.

restore_checkpoint(name)

Restore from a saved checkpoint.

disable_checkpointing()

Context manager to suppress checkpointing.

save(filepath)

Serialize experiment to disk.

load(filepath)

Load experiment from disk.

get_config() / from_config()

Config serialization.

get_state() / set_state()

State serialization.

Phase Types#

Phase

Module

Use Case

TrainPhase

modularml

Mini-batch gradient training with epochs and sampling.

EvalPhase

modularml

Forward-only evaluation on a data split.

FitPhase

modularml

Batch fitting for scikit-learn models.

Next Steps#

  • TrainPhase: Detailed training configuration, batch scheduling, and TrainPhase-level checkpointing — see $\textcolor{red}{\text{…to be added soon}}$

  • EvalPhase: Evaluation strategies, batched evaluation, and metrics — see $\textcolor{red}{\text{…to be added soon}}$

  • FitPhase: Batch-fit workflows for scikit-learn models — see $\textcolor{red}{\text{…to be added soon}}$