How to: Create and Use an Experiment#

An Experiment is the top-level orchestrator in ModularML. It coordinates:

  • Phases - units of work such as training (TrainPhase), evaluation (EvalPhase), or batch fitting (FitPhase)

  • Phase Groups - named collections of phases that execute in order

  • Callbacks - hooks at phase, group, and experiment boundaries

  • Checkpointing - automatic saving and restoring of experiment state

  • Execution History - records of every run for reproducibility

Note: This notebook covers the Experiment API and how phases are registered, organized, and executed. Phase-specific details (configuration, advanced usage) are covered in dedicated notebooks: $\textcolor{red}{\text{…to be added soon}}$

This notebook covers:

%matplotlib inline
import numpy as np

from modularml import (
    AppliedLoss,
    EvalPhase,
    Experiment,
    FeatureSet,
    InputBinding,
    Loss,
    ModelGraph,
    ModelNode,
    Optimizer,
    TrainPhase,
)
from modularml.core.experiment.phases.phase_group import PhaseGroup
from modularml.samplers import SimpleSampler

Creating an Experiment#

An Experiment is created with a label and an optional registration_policy that controls how duplicate node names are handled.

    Experiment(
        label: str,
        registration_policy: str | None = None,
        ctx: ExperimentContext | None = None,
        checkpointing: Checkpointing | None = None,
        callbacks: list[ExperimentCallback] | None = None,
        results_config: ResultsConfig | None = None,
    )

Parameter

Type

Default

Description

label

str

(required)

Name for this experiment.

registration_policy

str | None

None

How to handle duplicate node labels: "raise", "overwrite", or "rename".

ctx

ExperimentContext | None

None

Context to associate with. If None, a new context is created.

checkpointing

Checkpointing | None

None

Experiment-level checkpointing configuration.

callbacks

list[ExperimentCallback] | None

None

Experiment-level callbacks for phase/group boundaries.

results_config

ResultsConfig | None

None

Controls where phase results are stored (RAM vs disk). See Results Storage and Recording.

exp = Experiment(label="my_experiment", registration_policy="overwrite")
print(f"Experiment: {exp.label}")
print(f"Context:    {exp.ctx}")

Registration Policy#

The registration_policy determines what happens when two nodes share the same label. This is primarily useful in notebook environments where cells may be re-executed.

Policy

Behavior

"raise"

Raises an error on duplicate labels (default).

"overwrite"

Silently replaces the existing node.

"rename"

Assigns a unique suffix to the new node’s label.

Creating from an Active Context#

If nodes have already been registered in the current ExperimentContext, you can bind a new Experiment to that existing context with from_active_context(). This retains all previously registered nodes.

    exp = Experiment.from_active_context(
        label="my_experiment",
        registration_policy="overwrite",
    )

Setting Up a Model Graph#

Before defining phases, we need a ModelGraph with at least one ModelNode and a FeatureSet to supply data. The Experiment automatically tracks the ModelGraph registered in its context.

For details on creating model graphs, see How to: Create and Use a ModelGraph.

# Create synthetic data
rng = np.random.default_rng(42)

fs = FeatureSet.from_dict(
    label="SensorData",
    data={
        "voltage": list(rng.standard_normal((500, 10))),
        "soh": list(rng.standard_normal((500, 1))),
    },
    feature_keys="voltage",
    target_keys="soh",
)

# Create a train/test split
fs.split_random(
    ratios={
        "train": 0.8,
        "test": 0.2,
    },
    seed=13,
)
print(fs)
print(f"Splits: {fs.available_splits}")
fs.visualize()
from modularml.models.torch import SequentialMLP

# Reference defining which columns feed into the model
fs_ref = fs.reference(features="voltage", targets="soh")

# Create model node
node = ModelNode(
    label="MLP",
    model=SequentialMLP(output_shape=(1, 1), n_layers=2, hidden_dim=32),
    upstream_ref=fs_ref,
)

# Create model graph with a global optimizer
graph = ModelGraph(
    label="SimpleGraph",
    nodes=[node],
    optimizer=Optimizer("adam", opt_kwargs={"lr": 1e-3}, backend="torch"),
)

# Build the graph (infers shapes)
graph.build()
graph.visualize()

Defining Phases#

Phases are the executable units of an Experiment. Each phase type handles a different style of model execution:

Phase

Purpose

Key Concept

TrainPhase

Mini-batch gradient training

Requires a Sampler and Loss

EvalPhase

Forward-only evaluation

No sampler; runs on full split

FitPhase

Batch fitting (e.g., scikit-learn)

Entire dataset passed at once

All phases require input bindings that connect FeatureSet data to head GraphNodes in the model graph.

Input Bindings#

An InputBinding defines how data flows from a FeatureSet into a head GraphNode during a specific phase. There are two constructors:

  • InputBinding.for_training(...) - requires a Sampler to generate batches

  • InputBinding.for_evaluation(...) - passes data directly (no sampler)

Parameter

for_training

for_evaluation

node

required

required

sampler

required

-

upstream

required*

required*

split

optional

optional

* Can be None if the node has exactly one upstream FeatureSet.

# Training binding: requires a sampler
train_binding = InputBinding.for_training(
    node=node,
    sampler=SimpleSampler(batch_size=32, shuffle=True, seed=42),
    upstream=None,  # auto-resolved (node has one upstream FeatureSet)
    split="train",
)
print(f"Train binding node: {train_binding.node_id[:8]}...")
print(f"Train binding split: {train_binding.split}")
# Evaluation binding: no sampler needed
eval_binding = InputBinding.for_evaluation(
    node=node,
    upstream=None,
    split="test",
)
print(f"Eval binding split: {eval_binding.split}")

Defining a Loss#

Training phases require at least one AppliedLoss, which binds a Loss function to a specific ModelNode and specifies what inputs the loss receives.

    AppliedLoss(
        loss: Loss,
        on: str | ModelNode,
        inputs: list[str] | dict[str, str],
        weight: float = 1.0,
        label: str | None = None,
    )

The inputs argument uses string references to resolve data at runtime:

  • "outputs" - the model node’s predictions

  • "targets" - the target data passed through the model node

mse_loss = AppliedLoss(
    loss=Loss("mse", backend="torch"),
    on=node,
    inputs=["outputs", "targets"],
)
print(f"Loss: {mse_loss.label}")
print(f"Applied on: {mse_loss.node_id[:8]}...")

Creating a TrainPhase#

A TrainPhase performs mini-batch gradient training over one or more epochs.

There are two ways to create a TrainPhase:

  1. Default constructor - provide InputBindings explicitly

  2. from_split() convenience - auto-generates bindings from a split name

# Option A: Using explicit InputBindings
train_phase = TrainPhase(
    label="train",
    input_sources=[train_binding],
    losses=[mse_loss],
    n_epochs=3,
)
print(f"TrainPhase: {train_phase.label}")
print(f"  n_epochs: {train_phase.n_epochs}")
print(f"  losses:   {[ls.label for ls in train_phase.losses]}")

train_phase.visualize()
# Option B: Using the from_split() convenience constructor
# This auto-generates InputBindings for all active head nodes
train_phase_b = TrainPhase.from_split(
    label="train_from_split",
    split="train",
    sampler=SimpleSampler(batch_size=32, shuffle=True, seed=42),
    losses=[mse_loss],
    n_epochs=3,
)
print(f"TrainPhase (from_split): {train_phase_b.label}")

train_phase.visualize()

Creating an EvalPhase#

An EvalPhase runs a forward pass over a FeatureSet split without any gradient computation. All graph nodes are automatically frozen during evaluation.

# Using the from_split() convenience constructor
eval_phase = EvalPhase.from_split(
    label="eval",
    split="test",
    losses=[mse_loss],
)
print(f"EvalPhase: {eval_phase.label}")

eval_phase.visualize()

Creating a FitPhase#

A FitPhase fits batch-fit models (like scikit-learn estimators) on the entire dataset at once. It has no epochs or sampling. By default, fitted nodes are frozen after fitting.

    fit_phase = FitPhase.from_split(
        label="fit_rf",
        split="train",
        freeze_after_fit=True,  # default
    )

Note: FitPhase is only relevant when your ModelGraph contains scikit-learn (batch-fit) model nodes. We will not use it in the running examples below since our graph uses PyTorch models.


The Execution Plan#

Every Experiment has an execution_plan property - a PhaseGroup that defines the order in which phases execute when you call experiment.run().

Phases are added with add_phase() and execute in the order they are registered.

# Access the execution plan
plan = exp.execution_plan
print(f"Execution plan: {plan}")
print(f"Currently empty: {len(plan.all) == 0}")
# Register phases in execution order
plan.add_phase(train_phase)
plan.add_phase(eval_phase)

print(f"Plan entries: {len(plan.all)}")
for i, entry in enumerate(plan.all):
    print(f"  [{i}] {entry.label} ({type(entry).__name__})")

Accessing Phases#

Phases can be accessed by position (index) or by label.

# By index
first_phase = plan[0]
print(f"By index:  {first_phase.label}")

# By label
train_ref = plan["train"]
print(f"By label:  {train_ref.label}")

# Type-safe accessors
tp = plan.get_train_phase("train")
ep = plan.get_eval_phase("eval")
print(f"TrainPhase: {tp.label}, EvalPhase: {ep.label}")

Removing Phases#

Phases can be removed by index, label, or instance.

# Remove by label
plan.remove_phase("eval")
print(f"After remove: {[e.label for e in plan.all]}")

# Re-add for later examples
plan.add_phase(eval_phase)
print(f"After re-add: {[e.label for e in plan.all]}")

Convenience Methods#

The execution plan also provides convenience methods to construct and register phases in a single call:

    plan.add_train_phase(
        label="train",
        input_sources=[...],
        losses=[...],
        n_epochs=5,
    )

    plan.add_eval_phase(
        label="eval",
        input_sources=[...],
        losses=[...],
    )

Aliases add_train(), add_training(), add_eval(), and add_evaluation() are also available.


Running Phases#

Phases can be run individually with run_phase(), regardless of whether they are registered on the execution plan. Each run mutates experiment state and records an entry in history.

# Run the training phase
train_results = exp.run_phase(train_phase)
print("Training completed.")
print(f"  History entries: {len(exp.history)}")
# Run the evaluation phase
eval_results = exp.run_phase(eval_phase)
print("Evaluation completed.")
print(f"  History entries: {len(exp.history)}")

Display Options#

Each phase type accepts display-related keyword arguments to control progress bars:

TrainPhase:

Parameter

Default

Description

show_sampler_progress

True

Show progress for batch creation

show_training_progress

True

Show epoch-level progress bar

persist_progress

IN_NOTEBOOK

Keep progress bars visible after completion

persist_epoch_progress

IN_NOTEBOOK

Keep per-epoch bars visible

EvalPhase:

Parameter

Default

Description

show_eval_progress

False

Show evaluation progress bar

persist_progress

IN_NOTEBOOK

Keep progress bars visible after completion

Running the Full Execution Plan#

Calling experiment.run() executes all phases registered on the execution plan, in the order they were added. This is the primary entry point for running a complete experiment.

# Run the full execution plan (train -> eval)
results = exp.run()
print("Full run completed.")
print(f"History entries: {len(exp.history)}")

run() returns a PhaseGroupResults object that contains results from all executed phases.

results

Preview Mode#

Sometimes you want to evaluate a phase without permanently changing experiment state. The preview_phase() and preview_group() methods do exactly this:

  1. Capture the current experiment state

  2. Execute the phase/group

  3. Restore the original state

Preview runs are not recorded in history, and checkpointing is disabled.

history_before = len(exp.history)

# Preview does not mutate state
preview_res = exp.preview_phase(eval_phase)

history_after = len(exp.history)
print(f"History before: {history_before}")
print(f"History after:  {history_after}")
print(f"State was restored: {history_before == history_after}")

Execution History#

Every call to run_phase(), run_group(), or run() records an ExperimentRun in experiment.history. Each run captures:

  • Label, start/end timestamps, and status

  • Phase results (losses, outputs, etc.)

  • Execution metadata (timing per phase)

for i, run in enumerate(exp.history):
    print(
        f"  Run {i}: label={run.label!r}, "
        f"status={run.status}, "
        f"duration={run.ended_at - run.started_at}",
    )
# Access the most recent run
last = exp.last_run
print(f"Last run: {last.label}")
print(f"  Status:  {last.status}")
print(f"  Results: {type(last.results).__name__}")

Results Storage and Recording#

Every phase run produces a PhaseResults object that holds three kinds of data:

Store

Holds

Always in memory?

MetricStore

Scalar metrics (val_loss, train_loss, …)

Yes

ArtifactStore

Rich objects (figures, arrays, DataFrames)

Optional

ExecutionStore

Per-batch forward-pass tensors and losses

Optional

By default everything is kept in memory. For long runs or large datasets, output tensors can consume significant RAM. Two areguments let you manage this:

  1. ResultsConfig - controls where results are stored (RAM vs disk)

  2. result_recording on TrainPhase - controls how much is kept

ResultsConfig - Where Results Are Stored#

ResultsConfig is passed to Experiment (or Experiment.from_active_context()) and controls the storage backend for each result kind.

    ResultsConfig(
        results_dir: Path | None = None,
        save_execution: bool = True,
        save_metrics: bool = False,
        save_artifacts: bool = True,
    )

Parameter

Type

Default

Effect

results_dir

Path | None

None

Root directory for on-disk storage. None = all in memory.

save_execution

bool

True

Whether to persist ExecutionStore to disk.

save_metrics

bool

False

Whether to persist MetricStore to disk.

save_artifacts

bool

True

Whether to persist ArtifactStore to disk.

Accessing results is identical regardless of storage backend. results.artifacts(), results.tensors(), results.losses(), results.metrics() all work transparently whether data is in RAM or on disk.

from pathlib import Path
from tempfile import TemporaryDirectory

from modularml.core.experiment.results.results_config import ResultsConfig

# We're wrapping this block in a temporary ctx just to preserve the prior Experiment
with exp.ctx.temporary():

    # Default: everything in RAM (no ResultsConfig needed)
    exp_mem = Experiment(label="exp_mem")

    # Offload all results under a run directory
    run_dir = TemporaryDirectory()
    cfg_full = ResultsConfig(results_dir=Path(run_dir.name))
    print(f"Save artifacts on disk? {cfg_full.save_artifacts}")
    print(f"Save execution data on disk? {cfg_full.save_execution}")
    print(f"Save metrics on disk? {cfg_full.save_metrics}")

result_recording - How Much Training Data to Keep#

TrainPhase has a result_recording parameter that controls which execution contexts (per-batch forward-pass results) are retained in TrainResults. This adjusts which model output tensors to record (e.g., only from the last epoch).

    TrainPhase(
        ...
        result_recording: ResultRecording | str = ResultRecording.ALL,
    )

Mode

String

What is kept

Use when

ResultRecording.ALL

"all"

Every batch of every epoch (default)

You need per-batch outputs or losses for analysis

ResultRecording.LAST

"last"

Only the final epoch’s batches

Long runs; you only care about the end state

ResultRecording.NONE

"none"

Nothing - tensors are discarded after each batch

Metric-only runs; maximum memory savings

LAST + EarlyStopping(restore_best=True): when early stopping is active, "last" is automatically interpreted as the best epoch - the model state and its corresponding execution contexts are restored before the results object is returned.

Combine result_recording and ResultsConfig for fine-grained control:

    # Minimal RAM: keep only scalars during training, offload artifacts to disk
    train_phase = TrainPhase.from_split(
        ...,
        result_recording="none",   # drop per-batch tensors entirely
    )
    exp = Experiment(
        ...,
        results_config=ResultsConfig(base_dir=Path("./runs/exp_01")),
    )
from modularml import ResultRecording

# ALL (default): every batch of every epoch
train_all = TrainPhase.from_split(
    label="train_all",
    split="train",
    sampler=SimpleSampler(batch_size=32, shuffle=True, seed=42),
    losses=[mse_loss],
    n_epochs=2,
    result_recording=ResultRecording.ALL,
)
results_all = exp.preview_phase(train_all)
print(f"ALL -> execution contexts: {len(results_all.execution_contexts())}")

# LAST: only the final epoch's batches
train_last = TrainPhase.from_split(
    label="train_last",
    split="train",
    sampler=SimpleSampler(batch_size=32, shuffle=True, seed=42),
    losses=[mse_loss],
    n_epochs=2,
    result_recording=ResultRecording.LAST,
)
results_last = exp.preview_phase(train_last)
print(f"LAST -> execution contexts: {len(results_last.execution_contexts())}")

# NONE: no contexts kept; scalar metrics still logged
train_none = TrainPhase.from_split(
    label="train_none",
    split="train",
    sampler=SimpleSampler(batch_size=32, shuffle=True, seed=42),
    losses=[mse_loss],
    n_epochs=2,
    result_recording=ResultRecording.NONE,
)
results_none = exp.preview_phase(train_none)
print(f"NONE -> execution contexts: {len(results_none.execution_contexts())}")

Phase Groups#

A PhaseGroup is a named collection that organizes phases into logical blocks. Phase groups can be nested (a group can contain other groups), enabling hierarchical experiment structures.

The experiment’s execution_plan is itself a PhaseGroup.

# Create a sub-group for a train-eval cycle
cycle = PhaseGroup(label="train_eval_cycle")

cycle.add_phase(
    TrainPhase.from_split(
        label="cycle_train",
        split="train",
        sampler=SimpleSampler(batch_size=32, shuffle=True, seed=42),
        losses=[mse_loss],
        n_epochs=2,
    ),
)
cycle.add_phase(
    EvalPhase.from_split(
        label="cycle_eval",
        split="test",
        losses=[mse_loss],
    ),
)

print(f"Group: {cycle}")
print(f"Entries: {[e.label for e in cycle.all]}")
# Run the group directly
group_results = exp.run_group(cycle)
print(f"Group results: {group_results.flatten()}")

Nesting Groups#

Groups can be nested within the execution plan or within other groups. Use add_group() to nest a PhaseGroup inside another.

# Build a nested plan
outer = PhaseGroup(label="outer")

inner = PhaseGroup(label="inner")
inner.add_phase(
    TrainPhase.from_split(
        label="inner_train",
        split="train",
        sampler=SimpleSampler(batch_size=64, shuffle=True, seed=0),
        losses=[mse_loss],
        n_epochs=1,
    ),
)

outer.add_group(inner)
outer.add_phase(
    EvalPhase.from_split(
        label="outer_eval",
        split="test",
        losses=[mse_loss],
    ),
)

# flatten() unrolls all nested groups into execution order
print(f"Flattened: {[p.label for p in outer.flatten()]}")

PhaseGroup API#

Method

Description

add_phase(phase)

Register a phase.

add_group(group)

Register a nested group.

add_train_phase(...)

Construct and register a TrainPhase.

add_eval_phase(...)

Construct and register an EvalPhase.

remove_phase(key)

Remove a phase by index, label, or instance.

remove_group(key)

Remove a group by index, label, or instance.

clear()

Remove all entries.

flatten()

Unroll all nested groups into a flat list of phases.

get_phase(key)

Get a phase by index or label.

get_train_phase(key)

Get a TrainPhase by index or label.

get_eval_phase(key)

Get an EvalPhase by index or label.

get_group(key)

Get a nested PhaseGroup by index or label.

items()

Iterate over (label, entry) pairs.


Experiment Callbacks#

Experiment-level callbacks (ExperimentCallback) fire at phase and group boundaries during run(). They are distinct from phase-level Callbacks that fire at batch/epoch boundaries within a single phase.

Hook

Trigger

on_experiment_start(experiment)

Before the execution plan begins

on_experiment_end(experiment)

After the execution plan completes

on_phase_start(experiment, phase)

Before each phase executes

on_phase_end(experiment, phase)

After each phase completes

on_group_start(experiment, group)

Before each group executes

on_group_end(experiment, group)

After each group completes

on_exception(experiment, phase, exception)

On unhandled exception

Callbacks are registered via the constructor or add_callback():

    exp = Experiment(
        label="my_exp",
        callbacks=[my_callback],
    )

    # Or add later
    exp.add_callback(another_callback)

More details on callback usage is provided in: How to: Use Callbacks


Checkpointing#

Experiment-level checkpointing automatically saves the full experiment state to disk at configurable lifecycle hooks. This is useful for fault tolerance and resumption.

Experiment checkpointing only supports mode="disk" (in-memory snapshots of the full experiment state would be too large).

Configuring Checkpointing#

Checkpointing is configured via the Checkpointing class and passed at construction time or via set_checkpointing().

Valid save_on hooks for experiment-level checkpointing:

Hook

When

"phase_start"

Before each phase

"phase_end"

After each phase

"group_start"

Before each group

"group_end"

After each group

"experiment_start"

Before run() begins

"experiment_end"

After run() completes

    from modularml import Checkpointing

    exp = Experiment(
        label="checkpointed_exp",
        checkpointing=Checkpointing(
            mode="disk",
            save_on=["phase_end"],
            directory="./checkpoints",
        ),
    )

Manual Checkpointing#

You can also save and restore checkpoints manually.

from pathlib import Path
from tempfile import TemporaryDirectory

CKPT_DIR = TemporaryDirectory()

# Set the checkpoint directory
exp.set_checkpoint_dir(Path(CKPT_DIR.name))

# Save a checkpoint
ckpt_path = exp.save_checkpoint("after_training", overwrite=True)
print(f"Checkpoint saved to: {ckpt_path}")
print(f"Available checkpoints: {list(exp.available_checkpoints.keys())}")
# Restore from a checkpoint (by name or path)
exp.restore_checkpoint("after_training")
print("Checkpoint restored.")

Disabling Checkpointing#

Use the disable_checkpointing() context manager to temporarily suppress all checkpointing (both experiment-level and phase-level).

    with exp.disable_checkpointing():
        exp.run_phase(train_phase)  # No checkpoints saved

Serialization#

An Experiment can be fully serialized to disk via save() and reloaded with load(). This includes the model graph state, execution plan, and execution history. All results, even if written to disk, are captured in this single '.mml' file.

SAVE_DIR = TemporaryDirectory()

# Save the experiment
save_path = exp.save(Path(SAVE_DIR.name) / "my_experiment", overwrite=True)
print(f"Experiment saved to: {save_path}")

Since we are working in a notebook, reloading the saved files will recreate all nodes and FeatureSets defined in the serialized Experiment. These nodes will have overlapping IDs with the nodes previously defined in the notebook.

To allow the serialized file to replace all already active nodes, we need to set overwrite=True. Warnings will be printed for any collisions and overwrites.

Additionally, if the experiment had results or checkpoints that were written to disk, they require a new path to extract the copied disk files to. This is provided with the results_dir and checkpoint_dir arguments. If paths are not provided, the results and checkpoint will not be reloaded.

# Load the experiment
loaded_exp = Experiment.load(save_path, overwrite=True)
print(f"Loaded experiment: {loaded_exp.label}")
print(f"  Model graph: {loaded_exp.model_graph}")

The get_config() and get_state() methods provide lower-level access to the experiment’s structure and mutable state for custom serialization workflows.

    config = exp.get_config()   # Structure (label, plan, policy)
    state = exp.get_state()     # Mutable state (context, history, checkpoints)

    # Restore
    exp.set_state(state)

Saving Without FeatureSet Data#

For large datasets, bundling the full FeatureSet into the experiment archive can be prohibitively expensive. Pass include_featuresets=False to omit raw data from the save. The experiment still records enough structural metadata (schema, split configs, scaler configs, and the FeatureSet UUID) to validate and reattach the data on load.

The FeatureSet must be saved separately beforehand so it can be provided at load time.

FS_SAVE_DIR = TemporaryDirectory()
EXP_SAVE_DIR = TemporaryDirectory()

# Save the FeatureSet independently
fs_path = fs.save(Path(FS_SAVE_DIR.name) / "SensorData", overwrite=True)
print(f"FeatureSet saved to: {fs_path}")

# Save the experiment without bundling the FeatureSet raw data
slim_exp_path = exp.save(
    Path(EXP_SAVE_DIR.name) / "my_experiment_slim",
    include_featuresets=False,
    overwrite=True,
)
print(f"Experiment (slim) saved to: {slim_exp_path}")

To reload a slim experiment, pass the FeatureSet back via the featuresets argument. Each entry can be a FeatureSet instance or a path to a saved FeatureSet artifact.

The framework matches each stub by label, validates structural compatibility (columns, dtypes, shapes, sample count, split labels), and resets the FeatureSet’s UUID to match what the saved experiment graph expects so all model graph references resolve correctly.

# Load with a FeatureSet path — the framework validates schema and reattaches
loaded_slim = Experiment.load(
    slim_exp_path,
    featuresets=[fs_path],  # can also pass a FeatureSet instance directly
    overwrite=True,
)
print(f"Loaded experiment: {loaded_slim.label}")
print(f"  FeatureSet: {loaded_slim.featureset!r}")
print(f"  Model graph: {loaded_slim.model_graph!r}")

Summary#

Experiment Constructor#

Parameter

Type

Default

Description

label

str

(required)

Name for this experiment.

registration_policy

str | None

None

"raise", "overwrite", or "rename".

ctx

ExperimentContext | None

None

Context to bind to.

checkpointing

Checkpointing | None

None

Auto-checkpoint configuration.

callbacks

list[ExperimentCallback] | None

None

Experiment-level callbacks.

results_config

ResultsConfig | None

None

Storage backend for phase results.

Experiment Properties#

Property

Type

Description

ctx

ExperimentContext

The associated context.

model_graph

ModelGraph | None

The registered model graph.

execution_plan

PhaseGroup

Phases to run on run().

history

list[ExperimentRun]

All completed runs.

last_run

ExperimentRun | None

Most recent run.

checkpointing

Checkpointing | None

Checkpoint configuration.

available_checkpoints

dict[str, Path]

Saved checkpoint registry.

exp_callbacks

list[ExperimentCallback]

Registered callbacks.

Experiment Methods#

Method

Description

run()

Execute the full execution plan.

run_phase(phase)

Execute a single phase (records history).

run_group(group)

Execute a phase group (records history).

preview_phase(phase)

Execute a phase without mutating state.

preview_group(group)

Execute a group without mutating state.

add_callback(cb)

Register an experiment-level callback.

set_checkpointing(ckpt)

Attach/replace checkpointing configuration.

set_checkpoint_dir(path)

Set the checkpoint save directory.

save_checkpoint(name)

Manually save a checkpoint.

restore_checkpoint(name)

Restore from a saved checkpoint.

disable_checkpointing()

Context manager to suppress checkpointing.

save(filepath)

Serialize experiment to disk.

load(filepath)

Load experiment from disk.

get_config() / from_config()

Config serialization.

get_state() / set_state()

State serialization.

Phase Types#

Phase

Module

Use Case

TrainPhase

modularml

Mini-batch gradient training with epochs and sampling.

EvalPhase

modularml

Forward-only evaluation on a data split.

FitPhase

modularml

Batch fitting for scikit-learn models.

Results Storage: ResultsConfig#

Parameter

Type

Default

Effect

base_dir

Path | None

None

Default disk root. None = all in memory.

artifacts

Path | "in-memory" | None

None

Override for artifact storage.

metrics

Path | "in-memory" | None

None

Override for metric storage (scalars, usually in memory).

execution

Path | "in-memory" | None

None

Override for execution context (tensor) storage.

Training Recording: ResultRecording#

Mode

String

Contexts kept

Notes

ResultRecording.ALL

"all"

All epochs × batches

Default. Full post-run analysis.

ResultRecording.LAST

"last"

Final epoch only

With EarlyStopping(restore_best=True): best epoch.

ResultRecording.NONE

"none"

None

Scalars still logged; maximum memory savings.

Next Steps#

  • TrainPhase: Detailed training configuration, batch scheduling, and TrainPhase-level checkpointing - see $\textcolor{red}{\text{…to be added soon}}$

  • EvalPhase: Evaluation strategies, batched evaluation, and metrics - see $\textcolor{red}{\text{…to be added soon}}$

  • FitPhase: Batch-fit workflows for scikit-learn models - see $\textcolor{red}{\text{…to be added soon}}$