How to: Create and Use an Experiment#

An Experiment is the top-level orchestrator in ModularML. It coordinates:

Phases - units of work such as training (TrainPhase), evaluation (EvalPhase), or batch fitting (FitPhase)
Phase Groups - named collections of phases that execute in order
Callbacks - hooks at phase, group, and experiment boundaries
Checkpointing - automatic saving and restoring of experiment state
Execution History - records of every run for reproducibility

Note: This notebook covers the Experiment API and how phases are registered, organized, and executed. Phase-specific details (configuration, advanced usage) are covered in dedicated notebooks: $\textcolor{red}{\text{…to be added soon}}$

This notebook covers:

Creating an Experiment
Setting Up a Model Graph
Defining Phases
The Execution Plan
Running Phases
Running the Full Execution Plan
Preview Mode
Execution History
Results Storage and Recording
Phase Groups
Experiment Callbacks
Checkpointing
Serialization
Summary

%matplotlib inline
import numpy as np

from modularml import (
    AppliedLoss,
    EvalPhase,
    Experiment,
    FeatureSet,
    InputBinding,
    Loss,
    ModelGraph,
    ModelNode,
    Optimizer,
    TrainPhase,
)
from modularml.core.experiment.phases.phase_group import PhaseGroup
from modularml.samplers import SimpleSampler

Creating an Experiment#

An Experiment is created with a label and an optional registration_policy that controls how duplicate node names are handled.

    Experiment(
        label: str,
        registration_policy: str | None = None,
        ctx: ExperimentContext | None = None,
        checkpointing: Checkpointing | None = None,
        callbacks: list[ExperimentCallback] | None = None,
        results_config: ResultsConfig | None = None,
    )

Parameter	Type	Default	Description
`label`	`str`	(required)	Name for this experiment.
`registration_policy`	`str \| None`	`None`	How to handle duplicate node labels: `"raise"`, `"overwrite"`, or `"rename"`.
`ctx`	`ExperimentContext \| None`	`None`	Context to associate with. If `None`, a new context is created.
`checkpointing`	`Checkpointing \| None`	`None`	Experiment-level checkpointing configuration.
`callbacks`	`list[ExperimentCallback] \| None`	`None`	Experiment-level callbacks for phase/group boundaries.
`results_config`	`ResultsConfig \| None`	`None`	Controls where phase results are stored (RAM vs disk). See Results Storage and Recording.

exp = Experiment(label="my_experiment", registration_policy="overwrite")
print(f"Experiment: {exp.label}")
print(f"Context:    {exp.ctx}")

Registration Policy#

The registration_policy determines what happens when two nodes share the same label. This is primarily useful in notebook environments where cells may be re-executed.

Policy	Behavior
`"raise"`	Raises an error on duplicate labels (default).
`"overwrite"`	Silently replaces the existing node.
`"rename"`	Assigns a unique suffix to the new node’s label.

Creating from an Active Context#

If nodes have already been registered in the current ExperimentContext, you can bind a new Experiment to that existing context with from_active_context(). This retains all previously registered nodes.

    exp = Experiment.from_active_context(
        label="my_experiment",
        registration_policy="overwrite",
    )

Setting Up a Model Graph#

Before defining phases, we need a ModelGraph with at least one ModelNode and a FeatureSet to supply data. The Experiment automatically tracks the ModelGraph registered in its context.

For details on creating model graphs, see How to: Create and Use a ModelGraph.

# Create synthetic data
rng = np.random.default_rng(42)

fs = FeatureSet.from_dict(
    label="SensorData",
    data={
        "voltage": list(rng.standard_normal((500, 10))),
        "soh": list(rng.standard_normal((500, 1))),
    },
    feature_keys="voltage",
    target_keys="soh",
)

# Create a train/test split
fs.split_random(
    ratios={
        "train": 0.8,
        "test": 0.2,
    },
    seed=13,
)
print(fs)
print(f"Splits: {fs.available_splits}")
fs.visualize()

from modularml.models.torch import SequentialMLP

# Reference defining which columns feed into the model
fs_ref = fs.reference(features="voltage", targets="soh")

# Create model node
node = ModelNode(
    label="MLP",
    model=SequentialMLP(output_shape=(1, 1), n_layers=2, hidden_dim=32),
    upstream_ref=fs_ref,
)

# Create model graph with a global optimizer
graph = ModelGraph(
    label="SimpleGraph",
    nodes=[node],
    optimizer=Optimizer("adam", opt_kwargs={"lr": 1e-3}, backend="torch"),
)

# Build the graph (infers shapes)
graph.build()
graph.visualize()

Defining Phases#

Phases are the executable units of an Experiment. Each phase type handles a different style of model execution:

Phase	Purpose	Key Concept
`TrainPhase`	Mini-batch gradient training	Requires a `Sampler` and `Loss`
`EvalPhase`	Forward-only evaluation	No sampler; runs on full split
`FitPhase`	Batch fitting (e.g., scikit-learn)	Entire dataset passed at once

All phases require input bindings that connect FeatureSet data to head GraphNodes in the model graph.

Input Bindings#

An InputBinding defines how data flows from a FeatureSet into a head GraphNode during a specific phase. There are two constructors:

InputBinding.for_training(...) - requires a Sampler to generate batches
InputBinding.for_evaluation(...) - passes data directly (no sampler)

Parameter	`for_training`	`for_evaluation`
`node`	required	required
`sampler`	required	-
`upstream`	required*	required*
`split`	optional	optional

* Can be None if the node has exactly one upstream FeatureSet.

# Training binding: requires a sampler
train_binding = InputBinding.for_training(
    node=node,
    sampler=SimpleSampler(batch_size=32, shuffle=True, seed=42),
    upstream=None,  # auto-resolved (node has one upstream FeatureSet)
    split="train",
)
print(f"Train binding node: {train_binding.node_id[:8]}...")
print(f"Train binding split: {train_binding.split}")

# Evaluation binding: no sampler needed
eval_binding = InputBinding.for_evaluation(
    node=node,
    upstream=None,
    split="test",
)
print(f"Eval binding split: {eval_binding.split}")

Defining a Loss#

Training phases require at least one AppliedLoss, which binds a Loss function to a specific ModelNode and specifies what inputs the loss receives.

    AppliedLoss(
        loss: Loss,
        on: str | ModelNode,
        inputs: list[str] | dict[str, str],
        weight: float = 1.0,
        label: str | None = None,
    )

The inputs argument uses string references to resolve data at runtime:

"outputs" - the model node’s predictions
"targets" - the target data passed through the model node

mse_loss = AppliedLoss(
    loss=Loss("mse", backend="torch"),
    on=node,
    inputs=["outputs", "targets"],
)
print(f"Loss: {mse_loss.label}")
print(f"Applied on: {mse_loss.node_id[:8]}...")

Creating a TrainPhase#

A TrainPhase performs mini-batch gradient training over one or more epochs.

There are two ways to create a TrainPhase:

Default constructor - provide InputBindings explicitly
from_split() convenience - auto-generates bindings from a split name

# Option A: Using explicit InputBindings
train_phase = TrainPhase(
    label="train",
    input_sources=[train_binding],
    losses=[mse_loss],
    n_epochs=3,
)
print(f"TrainPhase: {train_phase.label}")
print(f"  n_epochs: {train_phase.n_epochs}")
print(f"  losses:   {[ls.label for ls in train_phase.losses]}")

train_phase.visualize()

# Option B: Using the from_split() convenience constructor
# This auto-generates InputBindings for all active head nodes
train_phase_b = TrainPhase.from_split(
    label="train_from_split",
    split="train",
    sampler=SimpleSampler(batch_size=32, shuffle=True, seed=42),
    losses=[mse_loss],
    n_epochs=3,
)
print(f"TrainPhase (from_split): {train_phase_b.label}")

train_phase.visualize()

Creating an EvalPhase#

An EvalPhase runs a forward pass over a FeatureSet split without any gradient computation. All graph nodes are automatically frozen during evaluation.

# Using the from_split() convenience constructor
eval_phase = EvalPhase.from_split(
    label="eval",
    split="test",
    losses=[mse_loss],
)
print(f"EvalPhase: {eval_phase.label}")

eval_phase.visualize()

Creating a FitPhase#

A FitPhase fits batch-fit models (like scikit-learn estimators) on the entire dataset at once. It has no epochs or sampling. By default, fitted nodes are frozen after fitting.

    fit_phase = FitPhase.from_split(
        label="fit_rf",
        split="train",
        freeze_after_fit=True,  # default
    )

Note: FitPhase is only relevant when your ModelGraph contains scikit-learn (batch-fit) model nodes. We will not use it in the running examples below since our graph uses PyTorch models.

The Execution Plan#

Every Experiment has an execution_plan property - a PhaseGroup that defines the order in which phases execute when you call experiment.run().

Phases are added with add_phase() and execute in the order they are registered.

# Access the execution plan
plan = exp.execution_plan
print(f"Execution plan: {plan}")
print(f"Currently empty: {len(plan.all) == 0}")

# Register phases in execution order
plan.add_phase(train_phase)
plan.add_phase(eval_phase)

print(f"Plan entries: {len(plan.all)}")
for i, entry in enumerate(plan.all):
    print(f"  [{i}] {entry.label} ({type(entry).__name__})")

Accessing Phases#

Phases can be accessed by position (index) or by label.

# By index
first_phase = plan[0]
print(f"By index:  {first_phase.label}")

# By label
train_ref = plan["train"]
print(f"By label:  {train_ref.label}")

# Type-safe accessors
tp = plan.get_train_phase("train")
ep = plan.get_eval_phase("eval")
print(f"TrainPhase: {tp.label}, EvalPhase: {ep.label}")

Removing Phases#

Phases can be removed by index, label, or instance.

# Remove by label
plan.remove_phase("eval")
print(f"After remove: {[e.label for e in plan.all]}")

# Re-add for later examples
plan.add_phase(eval_phase)
print(f"After re-add: {[e.label for e in plan.all]}")

Convenience Methods#

The execution plan also provides convenience methods to construct and register phases in a single call:

    plan.add_train_phase(
        label="train",
        input_sources=[...],
        losses=[...],
        n_epochs=5,
    )

    plan.add_eval_phase(
        label="eval",
        input_sources=[...],
        losses=[...],
    )

Aliases add_train(), add_training(), add_eval(), and add_evaluation() are also available.

Running Phases#

Phases can be run individually with run_phase(), regardless of whether they are registered on the execution plan. Each run mutates experiment state and records an entry in history.

# Run the training phase
train_results = exp.run_phase(train_phase)
print("Training completed.")
print(f"  History entries: {len(exp.history)}")

# Run the evaluation phase
eval_results = exp.run_phase(eval_phase)

print("Evaluation completed.")
print(f"  History entries: {len(exp.history)}")

Display Options#

Each phase type accepts display-related keyword arguments to control progress bars:

TrainPhase:

Parameter	Default	Description
`show_sampler_progress`	`True`	Show progress for batch creation
`show_training_progress`	`True`	Show epoch-level progress bar
`persist_progress`	`IN_NOTEBOOK`	Keep progress bars visible after completion
`persist_epoch_progress`	`IN_NOTEBOOK`	Keep per-epoch bars visible

EvalPhase:

Parameter	Default	Description
`show_eval_progress`	`False`	Show evaluation progress bar
`persist_progress`	`IN_NOTEBOOK`	Keep progress bars visible after completion

Running the Full Execution Plan#

Calling experiment.run() executes all phases registered on the execution plan, in the order they were added. This is the primary entry point for running a complete experiment.

# Run the full execution plan (train -> eval)
results = exp.run()
print("Full run completed.")
print(f"History entries: {len(exp.history)}")

run() returns a PhaseGroupResults object that contains results from all executed phases.

results

Preview Mode#

Sometimes you want to evaluate a phase without permanently changing experiment state. The preview_phase() and preview_group() methods do exactly this:

Capture the current experiment state
Execute the phase/group
Restore the original state

Preview runs are not recorded in history, and checkpointing is disabled.

history_before = len(exp.history)

# Preview does not mutate state
preview_res = exp.preview_phase(eval_phase)

history_after = len(exp.history)
print(f"History before: {history_before}")
print(f"History after:  {history_after}")
print(f"State was restored: {history_before == history_after}")

Execution History#

Every call to run_phase(), run_group(), or run() records an ExperimentRun in experiment.history. Each run captures:

Label, start/end timestamps, and status
Phase results (losses, outputs, etc.)
Execution metadata (timing per phase)

for i, run in enumerate(exp.history):
    print(
        f"  Run {i}: label={run.label!r}, "
        f"status={run.status}, "
        f"duration={run.ended_at - run.started_at}",
    )

# Access the most recent run
last = exp.last_run
print(f"Last run: {last.label}")
print(f"  Status:  {last.status}")
print(f"  Results: {type(last.results).__name__}")

Results Storage and Recording#

Every phase run produces a PhaseResults object that holds three kinds of data:

Store	Holds	Always in memory?
`MetricStore`	Scalar metrics (`val_loss`, `train_loss`, …)	Yes
`ArtifactStore`	Rich objects (figures, arrays, DataFrames)	Optional
`ExecutionStore`	Per-batch forward-pass tensors and losses	Optional

By default everything is kept in memory. For long runs or large datasets, output tensors can consume significant RAM. Two areguments let you manage this:

ResultsConfig - controls where results are stored (RAM vs disk)
result_recording on TrainPhase - controls how much is kept

`ResultsConfig` - Where Results Are Stored#

ResultsConfig is passed to Experiment (or Experiment.from_active_context()) and controls the storage backend for each result kind.

    ResultsConfig(
        results_dir: Path | None = None,
        save_execution: bool = True,
        save_metrics: bool = False,
        save_artifacts: bool = True,
    )

Parameter	Type	Default	Effect
`results_dir`	`Path \| None`	`None`	Root directory for on-disk storage. `None` = all in memory.
`save_execution`	`bool`	`True`	Whether to persist `ExecutionStore` to disk.
`save_metrics`	`bool`	`False`	Whether to persist `MetricStore` to disk.
`save_artifacts`	`bool`	`True`	Whether to persist `ArtifactStore` to disk.

Accessing results is identical regardless of storage backend. results.artifacts(), results.tensors(), results.losses(), results.metrics() all work transparently whether data is in RAM or on disk.

from pathlib import Path
from tempfile import TemporaryDirectory

from modularml.core.experiment.results.results_config import ResultsConfig

# We're wrapping this block in a temporary ctx just to preserve the prior Experiment
with exp.ctx.temporary():

    # Default: everything in RAM (no ResultsConfig needed)
    exp_mem = Experiment(label="exp_mem")

    # Offload all results under a run directory
    run_dir = TemporaryDirectory()
    cfg_full = ResultsConfig(results_dir=Path(run_dir.name))
    print(f"Save artifacts on disk? {cfg_full.save_artifacts}")
    print(f"Save execution data on disk? {cfg_full.save_execution}")
    print(f"Save metrics on disk? {cfg_full.save_metrics}")

`result_recording` - How Much Training Data to Keep#

TrainPhase has a result_recording parameter that controls which execution contexts (per-batch forward-pass results) are retained in TrainResults. This adjusts which model output tensors to record (e.g., only from the last epoch).

    TrainPhase(
        ...
        result_recording: ResultRecording | str = ResultRecording.ALL,
    )

Mode	String	What is kept	Use when
`ResultRecording.ALL`	`"all"`	Every batch of every epoch (default)	You need per-batch outputs or losses for analysis
`ResultRecording.LAST`	`"last"`	Only the final epoch’s batches	Long runs; you only care about the end state
`ResultRecording.NONE`	`"none"`	Nothing - tensors are discarded after each batch	Metric-only runs; maximum memory savings

LAST + EarlyStopping(restore_best=True): when early stopping is active, "last" is automatically interpreted as the best epoch - the model state and its corresponding execution contexts are restored before the results object is returned.

Combine result_recording and ResultsConfig for fine-grained control:

    # Minimal RAM: keep only scalars during training, offload artifacts to disk
    train_phase = TrainPhase.from_split(
        ...,
        result_recording="none",   # drop per-batch tensors entirely
    )
    exp = Experiment(
        ...,
        results_config=ResultsConfig(base_dir=Path("./runs/exp_01")),
    )

from modularml import ResultRecording

# ALL (default): every batch of every epoch
train_all = TrainPhase.from_split(
    label="train_all",
    split="train",
    sampler=SimpleSampler(batch_size=32, shuffle=True, seed=42),
    losses=[mse_loss],
    n_epochs=2,
    result_recording=ResultRecording.ALL,
)
results_all = exp.preview_phase(train_all)
print(f"ALL -> execution contexts: {len(results_all.execution_contexts())}")

# LAST: only the final epoch's batches
train_last = TrainPhase.from_split(
    label="train_last",
    split="train",
    sampler=SimpleSampler(batch_size=32, shuffle=True, seed=42),
    losses=[mse_loss],
    n_epochs=2,
    result_recording=ResultRecording.LAST,
)
results_last = exp.preview_phase(train_last)
print(f"LAST -> execution contexts: {len(results_last.execution_contexts())}")

# NONE: no contexts kept; scalar metrics still logged
train_none = TrainPhase.from_split(
    label="train_none",
    split="train",
    sampler=SimpleSampler(batch_size=32, shuffle=True, seed=42),
    losses=[mse_loss],
    n_epochs=2,
    result_recording=ResultRecording.NONE,
)
results_none = exp.preview_phase(train_none)
print(f"NONE -> execution contexts: {len(results_none.execution_contexts())}")

Phase Groups#

A PhaseGroup is a named collection that organizes phases into logical blocks. Phase groups can be nested (a group can contain other groups), enabling hierarchical experiment structures.

The experiment’s execution_plan is itself a PhaseGroup.

# Create a sub-group for a train-eval cycle
cycle = PhaseGroup(label="train_eval_cycle")

cycle.add_phase(
    TrainPhase.from_split(
        label="cycle_train",
        split="train",
        sampler=SimpleSampler(batch_size=32, shuffle=True, seed=42),
        losses=[mse_loss],
        n_epochs=2,
    ),
)
cycle.add_phase(
    EvalPhase.from_split(
        label="cycle_eval",
        split="test",
        losses=[mse_loss],
    ),
)

print(f"Group: {cycle}")
print(f"Entries: {[e.label for e in cycle.all]}")

# Run the group directly
group_results = exp.run_group(cycle)
print(f"Group results: {group_results.flatten()}")

Nesting Groups#

Groups can be nested within the execution plan or within other groups. Use add_group() to nest a PhaseGroup inside another.

# Build a nested plan
outer = PhaseGroup(label="outer")

inner = PhaseGroup(label="inner")
inner.add_phase(
    TrainPhase.from_split(
        label="inner_train",
        split="train",
        sampler=SimpleSampler(batch_size=64, shuffle=True, seed=0),
        losses=[mse_loss],
        n_epochs=1,
    ),
)

outer.add_group(inner)
outer.add_phase(
    EvalPhase.from_split(
        label="outer_eval",
        split="test",
        losses=[mse_loss],
    ),
)

# flatten() unrolls all nested groups into execution order
print(f"Flattened: {[p.label for p in outer.flatten()]}")

PhaseGroup API#

Method	Description
`add_phase(phase)`	Register a phase.
`add_group(group)`	Register a nested group.
`add_train_phase(...)`	Construct and register a `TrainPhase`.
`add_eval_phase(...)`	Construct and register an `EvalPhase`.
`remove_phase(key)`	Remove a phase by index, label, or instance.
`remove_group(key)`	Remove a group by index, label, or instance.
`clear()`	Remove all entries.
`flatten()`	Unroll all nested groups into a flat list of phases.
`get_phase(key)`	Get a phase by index or label.
`get_train_phase(key)`	Get a `TrainPhase` by index or label.
`get_eval_phase(key)`	Get an `EvalPhase` by index or label.
`get_group(key)`	Get a nested `PhaseGroup` by index or label.
`items()`	Iterate over `(label, entry)` pairs.

Experiment Callbacks#

Experiment-level callbacks (ExperimentCallback) fire at phase and group boundaries during run(). They are distinct from phase-level Callbacks that fire at batch/epoch boundaries within a single phase.

Hook	Trigger
`on_experiment_start(experiment)`	Before the execution plan begins
`on_experiment_end(experiment)`	After the execution plan completes
`on_phase_start(experiment, phase)`	Before each phase executes
`on_phase_end(experiment, phase)`	After each phase completes
`on_group_start(experiment, group)`	Before each group executes
`on_group_end(experiment, group)`	After each group completes
`on_exception(experiment, phase, exception)`	On unhandled exception

Callbacks are registered via the constructor or add_callback():

    exp = Experiment(
        label="my_exp",
        callbacks=[my_callback],
    )

    # Or add later
    exp.add_callback(another_callback)

More details on callback usage is provided in: How to: Use Callbacks

Checkpointing#

Experiment-level checkpointing automatically saves the full experiment state to disk at configurable lifecycle hooks. This is useful for fault tolerance and resumption.

Experiment checkpointing only supports mode="disk" (in-memory snapshots of the full experiment state would be too large).

Configuring Checkpointing#

Checkpointing is configured via the Checkpointing class and passed at construction time or via set_checkpointing().

Valid save_on hooks for experiment-level checkpointing:

Hook	When
`"phase_start"`	Before each phase
`"phase_end"`	After each phase
`"group_start"`	Before each group
`"group_end"`	After each group
`"experiment_start"`	Before `run()` begins
`"experiment_end"`	After `run()` completes

    from modularml import Checkpointing

    exp = Experiment(
        label="checkpointed_exp",
        checkpointing=Checkpointing(
            mode="disk",
            save_on=["phase_end"],
            directory="./checkpoints",
        ),
    )

Manual Checkpointing#

You can also save and restore checkpoints manually.

from pathlib import Path
from tempfile import TemporaryDirectory

CKPT_DIR = TemporaryDirectory()

# Set the checkpoint directory
exp.set_checkpoint_dir(Path(CKPT_DIR.name))

# Save a checkpoint
ckpt_path = exp.save_checkpoint("after_training", overwrite=True)
print(f"Checkpoint saved to: {ckpt_path}")
print(f"Available checkpoints: {list(exp.available_checkpoints.keys())}")

# Restore from a checkpoint (by name or path)
exp.restore_checkpoint("after_training")
print("Checkpoint restored.")

Disabling Checkpointing#

Use the disable_checkpointing() context manager to temporarily suppress all checkpointing (both experiment-level and phase-level).

    with exp.disable_checkpointing():
        exp.run_phase(train_phase)  # No checkpoints saved

Serialization#

An Experiment can be fully serialized to disk via save() and reloaded with load(). This includes the model graph state, execution plan, and execution history. All results, even if written to disk, are captured in this single '.mml' file.

SAVE_DIR = TemporaryDirectory()

# Save the experiment
save_path = exp.save(Path(SAVE_DIR.name) / "my_experiment", overwrite=True)
print(f"Experiment saved to: {save_path}")

Since we are working in a notebook, reloading the saved files will recreate all nodes and FeatureSets defined in the serialized Experiment. These nodes will have overlapping IDs with the nodes previously defined in the notebook.

To allow the serialized file to replace all already active nodes, we need to set overwrite=True. Warnings will be printed for any collisions and overwrites.

Additionally, if the experiment had results or checkpoints that were written to disk, they require a new path to extract the copied disk files to. This is provided with the results_dir and checkpoint_dir arguments. If paths are not provided, the results and checkpoint will not be reloaded.

# Load the experiment
loaded_exp = Experiment.load(save_path, overwrite=True)
print(f"Loaded experiment: {loaded_exp.label}")
print(f"  Model graph: {loaded_exp.model_graph}")

The get_config() and get_state() methods provide lower-level access to the experiment’s structure and mutable state for custom serialization workflows.

    config = exp.get_config()   # Structure (label, plan, policy)
    state = exp.get_state()     # Mutable state (context, history, checkpoints)

    # Restore
    exp.set_state(state)

Saving Without FeatureSet Data#

For large datasets, bundling the full FeatureSet into the experiment archive can be prohibitively expensive. Pass include_featuresets=False to omit raw data from the save. The experiment still records enough structural metadata (schema, split configs, scaler configs, and the FeatureSet UUID) to validate and reattach the data on load.

The FeatureSet must be saved separately beforehand so it can be provided at load time.

FS_SAVE_DIR = TemporaryDirectory()
EXP_SAVE_DIR = TemporaryDirectory()

# Save the FeatureSet independently
fs_path = fs.save(Path(FS_SAVE_DIR.name) / "SensorData", overwrite=True)
print(f"FeatureSet saved to: {fs_path}")

# Save the experiment without bundling the FeatureSet raw data
slim_exp_path = exp.save(
    Path(EXP_SAVE_DIR.name) / "my_experiment_slim",
    include_featuresets=False,
    overwrite=True,
)
print(f"Experiment (slim) saved to: {slim_exp_path}")

To reload a slim experiment, pass the FeatureSet back via the featuresets argument. Each entry can be a FeatureSet instance or a path to a saved FeatureSet artifact.

The framework matches each stub by label, validates structural compatibility (columns, dtypes, shapes, sample count, split labels), and resets the FeatureSet’s UUID to match what the saved experiment graph expects so all model graph references resolve correctly.

# Load with a FeatureSet path — the framework validates schema and reattaches
loaded_slim = Experiment.load(
    slim_exp_path,
    featuresets=[fs_path],  # can also pass a FeatureSet instance directly
    overwrite=True,
)
print(f"Loaded experiment: {loaded_slim.label}")
print(f"  FeatureSet: {loaded_slim.featureset!r}")
print(f"  Model graph: {loaded_slim.model_graph!r}")

Summary#

Experiment Constructor#

Parameter	Type	Default	Description
`label`	`str`	(required)	Name for this experiment.
`registration_policy`	`str \| None`	`None`	`"raise"`, `"overwrite"`, or `"rename"`.
`ctx`	`ExperimentContext \| None`	`None`	Context to bind to.
`checkpointing`	`Checkpointing \| None`	`None`	Auto-checkpoint configuration.
`callbacks`	`list[ExperimentCallback] \| None`	`None`	Experiment-level callbacks.
`results_config`	`ResultsConfig \| None`	`None`	Storage backend for phase results.

Experiment Properties#

Property	Type	Description
`ctx`	`ExperimentContext`	The associated context.
`model_graph`	`ModelGraph \| None`	The registered model graph.
`execution_plan`	`PhaseGroup`	Phases to run on `run()`.
`history`	`list[ExperimentRun]`	All completed runs.
`last_run`	`ExperimentRun \| None`	Most recent run.
`checkpointing`	`Checkpointing \| None`	Checkpoint configuration.
`available_checkpoints`	`dict[str, Path]`	Saved checkpoint registry.
`exp_callbacks`	`list[ExperimentCallback]`	Registered callbacks.

Experiment Methods#

Method	Description
`run()`	Execute the full execution plan.
`run_phase(phase)`	Execute a single phase (records history).
`run_group(group)`	Execute a phase group (records history).
`preview_phase(phase)`	Execute a phase without mutating state.
`preview_group(group)`	Execute a group without mutating state.
`add_callback(cb)`	Register an experiment-level callback.
`set_checkpointing(ckpt)`	Attach/replace checkpointing configuration.
`set_checkpoint_dir(path)`	Set the checkpoint save directory.
`save_checkpoint(name)`	Manually save a checkpoint.
`restore_checkpoint(name)`	Restore from a saved checkpoint.
`disable_checkpointing()`	Context manager to suppress checkpointing.
`save(filepath)`	Serialize experiment to disk.
`load(filepath)`	Load experiment from disk.
`get_config()` / `from_config()`	Config serialization.
`get_state()` / `set_state()`	State serialization.

Phase Types#

Phase	Module	Use Case
`TrainPhase`	`modularml`	Mini-batch gradient training with epochs and sampling.
`EvalPhase`	`modularml`	Forward-only evaluation on a data split.
`FitPhase`	`modularml`	Batch fitting for scikit-learn models.

Results Storage: `ResultsConfig`#

Parameter	Type	Default	Effect
`base_dir`	`Path \| None`	`None`	Default disk root. `None` = all in memory.
`artifacts`	`Path \| "in-memory" \| None`	`None`	Override for artifact storage.
`metrics`	`Path \| "in-memory" \| None`	`None`	Override for metric storage (scalars, usually in memory).
`execution`	`Path \| "in-memory" \| None`	`None`	Override for execution context (tensor) storage.

Training Recording: `ResultRecording`#

Mode	String	Contexts kept	Notes
`ResultRecording.ALL`	`"all"`	All epochs × batches	Default. Full post-run analysis.
`ResultRecording.LAST`	`"last"`	Final epoch only	With `EarlyStopping(restore_best=True)`: best epoch.
`ResultRecording.NONE`	`"none"`	None	Scalars still logged; maximum memory savings.

Next Steps#

TrainPhase: Detailed training configuration, batch scheduling, and TrainPhase-level checkpointing - see $\textcolor{red}{\text{…to be added soon}}$
EvalPhase: Evaluation strategies, batched evaluation, and metrics - see $\textcolor{red}{\text{…to be added soon}}$
FitPhase: Batch-fit workflows for scikit-learn models - see $\textcolor{red}{\text{…to be added soon}}$