How to: Create and Use an Experiment#
An Experiment is the top-level orchestrator in ModularML. It coordinates:
Phases - units of work such as training (
TrainPhase), evaluation (EvalPhase), or batch fitting (FitPhase)Phase Groups - named collections of phases that execute in order
Callbacks - hooks at phase, group, and experiment boundaries
Checkpointing - automatic saving and restoring of experiment state
Execution History - records of every run for reproducibility
Note: This notebook covers the
ExperimentAPI and how phases are registered, organized, and executed. Phase-specific details (configuration, advanced usage) are covered in dedicated notebooks: $\textcolor{red}{\text{…to be added soon}}$
This notebook covers:
%matplotlib inline
import numpy as np
from modularml import (
AppliedLoss,
EvalPhase,
Experiment,
FeatureSet,
InputBinding,
Loss,
ModelGraph,
ModelNode,
Optimizer,
TrainPhase,
)
from modularml.core.experiment.phases.phase_group import PhaseGroup
from modularml.samplers import SimpleSampler
Creating an Experiment#
An Experiment is created with a label and an optional registration_policy that
controls how duplicate node names are handled.
Experiment(
label: str,
registration_policy: str | None = None,
ctx: ExperimentContext | None = None,
checkpointing: Checkpointing | None = None,
callbacks: list[ExperimentCallback] | None = None,
results_config: ResultsConfig | None = None,
)
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
(required) |
Name for this experiment. |
|
|
|
How to handle duplicate node labels: |
|
|
|
Context to associate with. If |
|
|
|
Experiment-level checkpointing configuration. |
|
|
|
Experiment-level callbacks for phase/group boundaries. |
|
|
|
Controls where phase results are stored (RAM vs disk). See Results Storage and Recording. |
exp = Experiment(label="my_experiment", registration_policy="overwrite")
print(f"Experiment: {exp.label}")
print(f"Context: {exp.ctx}")
Registration Policy#
The registration_policy determines what happens when two nodes share the same label.
This is primarily useful in notebook environments where cells may be re-executed.
Policy |
Behavior |
|---|---|
|
Raises an error on duplicate labels (default). |
|
Silently replaces the existing node. |
|
Assigns a unique suffix to the new node’s label. |
Creating from an Active Context#
If nodes have already been registered in the current ExperimentContext,
you can bind a new Experiment to that existing context with from_active_context().
This retains all previously registered nodes.
exp = Experiment.from_active_context(
label="my_experiment",
registration_policy="overwrite",
)
Setting Up a Model Graph#
Before defining phases, we need a ModelGraph with at least one ModelNode and a
FeatureSet to supply data. The Experiment automatically tracks the ModelGraph
registered in its context.
For details on creating model graphs, see How to: Create and Use a ModelGraph.
# Create synthetic data
rng = np.random.default_rng(42)
fs = FeatureSet.from_dict(
label="SensorData",
data={
"voltage": list(rng.standard_normal((500, 10))),
"soh": list(rng.standard_normal((500, 1))),
},
feature_keys="voltage",
target_keys="soh",
)
# Create a train/test split
fs.split_random(
ratios={
"train": 0.8,
"test": 0.2,
},
seed=13,
)
print(fs)
print(f"Splits: {fs.available_splits}")
fs.visualize()
from modularml.models.torch import SequentialMLP
# Reference defining which columns feed into the model
fs_ref = fs.reference(features="voltage", targets="soh")
# Create model node
node = ModelNode(
label="MLP",
model=SequentialMLP(output_shape=(1, 1), n_layers=2, hidden_dim=32),
upstream_ref=fs_ref,
)
# Create model graph with a global optimizer
graph = ModelGraph(
label="SimpleGraph",
nodes=[node],
optimizer=Optimizer("adam", opt_kwargs={"lr": 1e-3}, backend="torch"),
)
# Build the graph (infers shapes)
graph.build()
graph.visualize()
Defining Phases#
Phases are the executable units of an Experiment. Each phase type handles a
different style of model execution:
Phase |
Purpose |
Key Concept |
|---|---|---|
|
Mini-batch gradient training |
Requires a |
|
Forward-only evaluation |
No sampler; runs on full split |
|
Batch fitting (e.g., scikit-learn) |
Entire dataset passed at once |
All phases require input bindings that connect FeatureSet data to head
GraphNodes in the model graph.
Input Bindings#
An InputBinding defines how data flows from a FeatureSet into a head GraphNode
during a specific phase. There are two constructors:
InputBinding.for_training(...)- requires aSamplerto generate batchesInputBinding.for_evaluation(...)- passes data directly (no sampler)
Parameter |
|
|
|---|---|---|
|
required |
required |
|
required |
- |
|
required* |
required* |
|
optional |
optional |
* Can be None if the node has exactly one upstream FeatureSet.
# Training binding: requires a sampler
train_binding = InputBinding.for_training(
node=node,
sampler=SimpleSampler(batch_size=32, shuffle=True, seed=42),
upstream=None, # auto-resolved (node has one upstream FeatureSet)
split="train",
)
print(f"Train binding node: {train_binding.node_id[:8]}...")
print(f"Train binding split: {train_binding.split}")
# Evaluation binding: no sampler needed
eval_binding = InputBinding.for_evaluation(
node=node,
upstream=None,
split="test",
)
print(f"Eval binding split: {eval_binding.split}")
Defining a Loss#
Training phases require at least one AppliedLoss, which binds a Loss function to
a specific ModelNode and specifies what inputs the loss receives.
AppliedLoss(
loss: Loss,
on: str | ModelNode,
inputs: list[str] | dict[str, str],
weight: float = 1.0,
label: str | None = None,
)
The inputs argument uses string references to resolve data at runtime:
"outputs"- the model node’s predictions"targets"- the target data passed through the model node
mse_loss = AppliedLoss(
loss=Loss("mse", backend="torch"),
on=node,
inputs=["outputs", "targets"],
)
print(f"Loss: {mse_loss.label}")
print(f"Applied on: {mse_loss.node_id[:8]}...")
Creating a TrainPhase#
A TrainPhase performs mini-batch gradient training over one or more epochs.
There are two ways to create a TrainPhase:
Default constructor - provide
InputBindings explicitlyfrom_split()convenience - auto-generates bindings from a split name
# Option A: Using explicit InputBindings
train_phase = TrainPhase(
label="train",
input_sources=[train_binding],
losses=[mse_loss],
n_epochs=3,
)
print(f"TrainPhase: {train_phase.label}")
print(f" n_epochs: {train_phase.n_epochs}")
print(f" losses: {[ls.label for ls in train_phase.losses]}")
train_phase.visualize()
# Option B: Using the from_split() convenience constructor
# This auto-generates InputBindings for all active head nodes
train_phase_b = TrainPhase.from_split(
label="train_from_split",
split="train",
sampler=SimpleSampler(batch_size=32, shuffle=True, seed=42),
losses=[mse_loss],
n_epochs=3,
)
print(f"TrainPhase (from_split): {train_phase_b.label}")
train_phase.visualize()
Creating an EvalPhase#
An EvalPhase runs a forward pass over a FeatureSet split without any gradient
computation. All graph nodes are automatically frozen during evaluation.
# Using the from_split() convenience constructor
eval_phase = EvalPhase.from_split(
label="eval",
split="test",
losses=[mse_loss],
)
print(f"EvalPhase: {eval_phase.label}")
eval_phase.visualize()
Creating a FitPhase#
A FitPhase fits batch-fit models (like scikit-learn estimators) on the entire
dataset at once. It has no epochs or sampling. By default, fitted nodes are frozen
after fitting.
fit_phase = FitPhase.from_split(
label="fit_rf",
split="train",
freeze_after_fit=True, # default
)
Note: FitPhase is only relevant when your
ModelGraphcontains scikit-learn (batch-fit) model nodes. We will not use it in the running examples below since our graph uses PyTorch models.
The Execution Plan#
Every Experiment has an execution_plan property - a PhaseGroup that defines the
order in which phases execute when you call experiment.run().
Phases are added with add_phase() and execute in the order they are registered.
# Access the execution plan
plan = exp.execution_plan
print(f"Execution plan: {plan}")
print(f"Currently empty: {len(plan.all) == 0}")
# Register phases in execution order
plan.add_phase(train_phase)
plan.add_phase(eval_phase)
print(f"Plan entries: {len(plan.all)}")
for i, entry in enumerate(plan.all):
print(f" [{i}] {entry.label} ({type(entry).__name__})")
Accessing Phases#
Phases can be accessed by position (index) or by label.
# By index
first_phase = plan[0]
print(f"By index: {first_phase.label}")
# By label
train_ref = plan["train"]
print(f"By label: {train_ref.label}")
# Type-safe accessors
tp = plan.get_train_phase("train")
ep = plan.get_eval_phase("eval")
print(f"TrainPhase: {tp.label}, EvalPhase: {ep.label}")
Removing Phases#
Phases can be removed by index, label, or instance.
# Remove by label
plan.remove_phase("eval")
print(f"After remove: {[e.label for e in plan.all]}")
# Re-add for later examples
plan.add_phase(eval_phase)
print(f"After re-add: {[e.label for e in plan.all]}")
Convenience Methods#
The execution plan also provides convenience methods to construct and register phases in a single call:
plan.add_train_phase(
label="train",
input_sources=[...],
losses=[...],
n_epochs=5,
)
plan.add_eval_phase(
label="eval",
input_sources=[...],
losses=[...],
)
Aliases add_train(), add_training(), add_eval(), and add_evaluation() are also available.
Running Phases#
Phases can be run individually with run_phase(), regardless of whether they
are registered on the execution plan. Each run mutates experiment state and
records an entry in history.
# Run the training phase
train_results = exp.run_phase(train_phase)
print("Training completed.")
print(f" History entries: {len(exp.history)}")
# Run the evaluation phase
eval_results = exp.run_phase(eval_phase)
print("Evaluation completed.")
print(f" History entries: {len(exp.history)}")
Display Options#
Each phase type accepts display-related keyword arguments to control progress bars:
TrainPhase:
Parameter |
Default |
Description |
|---|---|---|
|
|
Show progress for batch creation |
|
|
Show epoch-level progress bar |
|
|
Keep progress bars visible after completion |
|
|
Keep per-epoch bars visible |
EvalPhase:
Parameter |
Default |
Description |
|---|---|---|
|
|
Show evaluation progress bar |
|
|
Keep progress bars visible after completion |
Running the Full Execution Plan#
Calling experiment.run() executes all phases registered on the execution plan,
in the order they were added. This is the primary entry point for running a
complete experiment.
# Run the full execution plan (train -> eval)
results = exp.run()
print("Full run completed.")
print(f"History entries: {len(exp.history)}")
run() returns a PhaseGroupResults object that contains results from all
executed phases.
results
Preview Mode#
Sometimes you want to evaluate a phase without permanently changing experiment
state. The preview_phase() and preview_group() methods do exactly this:
Capture the current experiment state
Execute the phase/group
Restore the original state
Preview runs are not recorded in history, and checkpointing is disabled.
history_before = len(exp.history)
# Preview does not mutate state
preview_res = exp.preview_phase(eval_phase)
history_after = len(exp.history)
print(f"History before: {history_before}")
print(f"History after: {history_after}")
print(f"State was restored: {history_before == history_after}")
Execution History#
Every call to run_phase(), run_group(), or run() records an ExperimentRun
in experiment.history. Each run captures:
Label, start/end timestamps, and status
Phase results (losses, outputs, etc.)
Execution metadata (timing per phase)
for i, run in enumerate(exp.history):
print(
f" Run {i}: label={run.label!r}, "
f"status={run.status}, "
f"duration={run.ended_at - run.started_at}",
)
# Access the most recent run
last = exp.last_run
print(f"Last run: {last.label}")
print(f" Status: {last.status}")
print(f" Results: {type(last.results).__name__}")
Results Storage and Recording#
Every phase run produces a PhaseResults object that holds three kinds of data:
Store |
Holds |
Always in memory? |
|---|---|---|
|
Scalar metrics ( |
Yes |
|
Rich objects (figures, arrays, DataFrames) |
Optional |
|
Per-batch forward-pass tensors and losses |
Optional |
By default everything is kept in memory. For long runs or large datasets, output tensors can consume significant RAM. Two areguments let you manage this:
ResultsConfig- controls where results are stored (RAM vs disk)result_recordingonTrainPhase- controls how much is kept
ResultsConfig - Where Results Are Stored#
ResultsConfig is passed to Experiment (or Experiment.from_active_context()) and
controls the storage backend for each result kind.
ResultsConfig(
results_dir: Path | None = None,
save_execution: bool = True,
save_metrics: bool = False,
save_artifacts: bool = True,
)
Parameter |
Type |
Default |
Effect |
|---|---|---|---|
|
|
|
Root directory for on-disk storage. |
|
|
|
Whether to persist |
|
|
|
Whether to persist |
|
|
|
Whether to persist |
Accessing results is identical regardless of storage backend.
results.artifacts(), results.tensors(), results.losses(), results.metrics() all work transparently whether data is in RAM or on disk.
from pathlib import Path
from tempfile import TemporaryDirectory
from modularml.core.experiment.results.results_config import ResultsConfig
# We're wrapping this block in a temporary ctx just to preserve the prior Experiment
with exp.ctx.temporary():
# Default: everything in RAM (no ResultsConfig needed)
exp_mem = Experiment(label="exp_mem")
# Offload all results under a run directory
run_dir = TemporaryDirectory()
cfg_full = ResultsConfig(results_dir=Path(run_dir.name))
print(f"Save artifacts on disk? {cfg_full.save_artifacts}")
print(f"Save execution data on disk? {cfg_full.save_execution}")
print(f"Save metrics on disk? {cfg_full.save_metrics}")
result_recording - How Much Training Data to Keep#
TrainPhase has a result_recording parameter that controls which execution
contexts (per-batch forward-pass results) are retained in TrainResults.
This adjusts which model output tensors to record (e.g., only from the last epoch).
TrainPhase(
...
result_recording: ResultRecording | str = ResultRecording.ALL,
)
Mode |
String |
What is kept |
Use when |
|---|---|---|---|
|
|
Every batch of every epoch (default) |
You need per-batch outputs or losses for analysis |
|
|
Only the final epoch’s batches |
Long runs; you only care about the end state |
|
|
Nothing - tensors are discarded after each batch |
Metric-only runs; maximum memory savings |
LAST + EarlyStopping(restore_best=True): when early stopping is active, "last"
is automatically interpreted as the best epoch - the model state and its corresponding
execution contexts are restored before the results object is returned.
Combine result_recording and ResultsConfig for fine-grained control:
# Minimal RAM: keep only scalars during training, offload artifacts to disk
train_phase = TrainPhase.from_split(
...,
result_recording="none", # drop per-batch tensors entirely
)
exp = Experiment(
...,
results_config=ResultsConfig(base_dir=Path("./runs/exp_01")),
)
from modularml import ResultRecording
# ALL (default): every batch of every epoch
train_all = TrainPhase.from_split(
label="train_all",
split="train",
sampler=SimpleSampler(batch_size=32, shuffle=True, seed=42),
losses=[mse_loss],
n_epochs=2,
result_recording=ResultRecording.ALL,
)
results_all = exp.preview_phase(train_all)
print(f"ALL -> execution contexts: {len(results_all.execution_contexts())}")
# LAST: only the final epoch's batches
train_last = TrainPhase.from_split(
label="train_last",
split="train",
sampler=SimpleSampler(batch_size=32, shuffle=True, seed=42),
losses=[mse_loss],
n_epochs=2,
result_recording=ResultRecording.LAST,
)
results_last = exp.preview_phase(train_last)
print(f"LAST -> execution contexts: {len(results_last.execution_contexts())}")
# NONE: no contexts kept; scalar metrics still logged
train_none = TrainPhase.from_split(
label="train_none",
split="train",
sampler=SimpleSampler(batch_size=32, shuffle=True, seed=42),
losses=[mse_loss],
n_epochs=2,
result_recording=ResultRecording.NONE,
)
results_none = exp.preview_phase(train_none)
print(f"NONE -> execution contexts: {len(results_none.execution_contexts())}")
Phase Groups#
A PhaseGroup is a named collection that organizes phases into logical blocks.
Phase groups can be nested (a group can contain other groups), enabling
hierarchical experiment structures.
The experiment’s execution_plan is itself a PhaseGroup.
# Create a sub-group for a train-eval cycle
cycle = PhaseGroup(label="train_eval_cycle")
cycle.add_phase(
TrainPhase.from_split(
label="cycle_train",
split="train",
sampler=SimpleSampler(batch_size=32, shuffle=True, seed=42),
losses=[mse_loss],
n_epochs=2,
),
)
cycle.add_phase(
EvalPhase.from_split(
label="cycle_eval",
split="test",
losses=[mse_loss],
),
)
print(f"Group: {cycle}")
print(f"Entries: {[e.label for e in cycle.all]}")
# Run the group directly
group_results = exp.run_group(cycle)
print(f"Group results: {group_results.flatten()}")
Nesting Groups#
Groups can be nested within the execution plan or within other groups.
Use add_group() to nest a PhaseGroup inside another.
# Build a nested plan
outer = PhaseGroup(label="outer")
inner = PhaseGroup(label="inner")
inner.add_phase(
TrainPhase.from_split(
label="inner_train",
split="train",
sampler=SimpleSampler(batch_size=64, shuffle=True, seed=0),
losses=[mse_loss],
n_epochs=1,
),
)
outer.add_group(inner)
outer.add_phase(
EvalPhase.from_split(
label="outer_eval",
split="test",
losses=[mse_loss],
),
)
# flatten() unrolls all nested groups into execution order
print(f"Flattened: {[p.label for p in outer.flatten()]}")
PhaseGroup API#
Method |
Description |
|---|---|
|
Register a phase. |
|
Register a nested group. |
|
Construct and register a |
|
Construct and register an |
|
Remove a phase by index, label, or instance. |
|
Remove a group by index, label, or instance. |
|
Remove all entries. |
|
Unroll all nested groups into a flat list of phases. |
|
Get a phase by index or label. |
|
Get a |
|
Get an |
|
Get a nested |
|
Iterate over |
Experiment Callbacks#
Experiment-level callbacks (ExperimentCallback) fire at phase and group
boundaries during run(). They are distinct from phase-level Callbacks that
fire at batch/epoch boundaries within a single phase.
Hook |
Trigger |
|---|---|
|
Before the execution plan begins |
|
After the execution plan completes |
|
Before each phase executes |
|
After each phase completes |
|
Before each group executes |
|
After each group completes |
|
On unhandled exception |
Callbacks are registered via the constructor or add_callback():
exp = Experiment(
label="my_exp",
callbacks=[my_callback],
)
# Or add later
exp.add_callback(another_callback)
More details on callback usage is provided in: How to: Use Callbacks
Checkpointing#
Experiment-level checkpointing automatically saves the full experiment state to disk at configurable lifecycle hooks. This is useful for fault tolerance and resumption.
Experiment checkpointing only supports mode="disk" (in-memory snapshots of the
full experiment state would be too large).
Configuring Checkpointing#
Checkpointing is configured via the Checkpointing class and passed at
construction time or via set_checkpointing().
Valid save_on hooks for experiment-level checkpointing:
Hook |
When |
|---|---|
|
Before each phase |
|
After each phase |
|
Before each group |
|
After each group |
|
Before |
|
After |
from modularml import Checkpointing
exp = Experiment(
label="checkpointed_exp",
checkpointing=Checkpointing(
mode="disk",
save_on=["phase_end"],
directory="./checkpoints",
),
)
Manual Checkpointing#
You can also save and restore checkpoints manually.
from pathlib import Path
from tempfile import TemporaryDirectory
CKPT_DIR = TemporaryDirectory()
# Set the checkpoint directory
exp.set_checkpoint_dir(Path(CKPT_DIR.name))
# Save a checkpoint
ckpt_path = exp.save_checkpoint("after_training", overwrite=True)
print(f"Checkpoint saved to: {ckpt_path}")
print(f"Available checkpoints: {list(exp.available_checkpoints.keys())}")
# Restore from a checkpoint (by name or path)
exp.restore_checkpoint("after_training")
print("Checkpoint restored.")
Disabling Checkpointing#
Use the disable_checkpointing() context manager to temporarily suppress all
checkpointing (both experiment-level and phase-level).
with exp.disable_checkpointing():
exp.run_phase(train_phase) # No checkpoints saved
Serialization#
An Experiment can be fully serialized to disk via save() and reloaded with load().
This includes the model graph state, execution plan, and execution history. All results, even if written to disk, are captured in this single '.mml' file.
SAVE_DIR = TemporaryDirectory()
# Save the experiment
save_path = exp.save(Path(SAVE_DIR.name) / "my_experiment", overwrite=True)
print(f"Experiment saved to: {save_path}")
Since we are working in a notebook, reloading the saved files will recreate all nodes and FeatureSets defined in the serialized Experiment. These nodes will have overlapping IDs with the nodes previously defined in the notebook.
To allow the serialized file to replace all already active nodes, we need to set overwrite=True. Warnings will be printed for any collisions and overwrites.
Additionally, if the experiment had results or checkpoints that were written to disk, they require a new path to extract the copied disk files to. This is provided with the results_dir and checkpoint_dir arguments. If paths are not provided, the results and checkpoint will not be reloaded.
# Load the experiment
loaded_exp = Experiment.load(save_path, overwrite=True)
print(f"Loaded experiment: {loaded_exp.label}")
print(f" Model graph: {loaded_exp.model_graph}")
The get_config() and get_state() methods provide lower-level access to the
experiment’s structure and mutable state for custom serialization workflows.
config = exp.get_config() # Structure (label, plan, policy)
state = exp.get_state() # Mutable state (context, history, checkpoints)
# Restore
exp.set_state(state)
Saving Without FeatureSet Data#
For large datasets, bundling the full FeatureSet into the experiment archive can be
prohibitively expensive. Pass include_featuresets=False to omit raw data from the
save. The experiment still records enough structural metadata (schema, split configs,
scaler configs, and the FeatureSet UUID) to validate and reattach the data on load.
The FeatureSet must be saved separately beforehand so it can be provided at load time.
FS_SAVE_DIR = TemporaryDirectory()
EXP_SAVE_DIR = TemporaryDirectory()
# Save the FeatureSet independently
fs_path = fs.save(Path(FS_SAVE_DIR.name) / "SensorData", overwrite=True)
print(f"FeatureSet saved to: {fs_path}")
# Save the experiment without bundling the FeatureSet raw data
slim_exp_path = exp.save(
Path(EXP_SAVE_DIR.name) / "my_experiment_slim",
include_featuresets=False,
overwrite=True,
)
print(f"Experiment (slim) saved to: {slim_exp_path}")
To reload a slim experiment, pass the FeatureSet back via the featuresets argument.
Each entry can be a FeatureSet instance or a path to a saved FeatureSet artifact.
The framework matches each stub by label, validates structural compatibility (columns, dtypes, shapes, sample count, split labels), and resets the FeatureSet’s UUID to match what the saved experiment graph expects so all model graph references resolve correctly.
# Load with a FeatureSet path — the framework validates schema and reattaches
loaded_slim = Experiment.load(
slim_exp_path,
featuresets=[fs_path], # can also pass a FeatureSet instance directly
overwrite=True,
)
print(f"Loaded experiment: {loaded_slim.label}")
print(f" FeatureSet: {loaded_slim.featureset!r}")
print(f" Model graph: {loaded_slim.model_graph!r}")
Summary#
Experiment Constructor#
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
(required) |
Name for this experiment. |
|
|
|
|
|
|
|
Context to bind to. |
|
|
|
Auto-checkpoint configuration. |
|
|
|
Experiment-level callbacks. |
|
|
|
Storage backend for phase results. |
Experiment Properties#
Property |
Type |
Description |
|---|---|---|
|
|
The associated context. |
|
|
The registered model graph. |
|
|
Phases to run on |
|
|
All completed runs. |
|
|
Most recent run. |
|
|
Checkpoint configuration. |
|
|
Saved checkpoint registry. |
|
|
Registered callbacks. |
Experiment Methods#
Method |
Description |
|---|---|
|
Execute the full execution plan. |
|
Execute a single phase (records history). |
|
Execute a phase group (records history). |
|
Execute a phase without mutating state. |
|
Execute a group without mutating state. |
|
Register an experiment-level callback. |
|
Attach/replace checkpointing configuration. |
|
Set the checkpoint save directory. |
|
Manually save a checkpoint. |
|
Restore from a saved checkpoint. |
|
Context manager to suppress checkpointing. |
|
Serialize experiment to disk. |
|
Load experiment from disk. |
|
Config serialization. |
|
State serialization. |
Phase Types#
Phase |
Module |
Use Case |
|---|---|---|
|
|
Mini-batch gradient training with epochs and sampling. |
|
|
Forward-only evaluation on a data split. |
|
|
Batch fitting for scikit-learn models. |
Results Storage: ResultsConfig#
Parameter |
Type |
Default |
Effect |
|---|---|---|---|
|
|
|
Default disk root. |
|
|
|
Override for artifact storage. |
|
|
|
Override for metric storage (scalars, usually in memory). |
|
|
|
Override for execution context (tensor) storage. |
Training Recording: ResultRecording#
Mode |
String |
Contexts kept |
Notes |
|---|---|---|---|
|
|
All epochs × batches |
Default. Full post-run analysis. |
|
|
Final epoch only |
With |
|
|
None |
Scalars still logged; maximum memory savings. |
Next Steps#
TrainPhase: Detailed training configuration, batch scheduling, and TrainPhase-level checkpointing - see $\textcolor{red}{\text{…to be added soon}}$
EvalPhase: Evaluation strategies, batched evaluation, and metrics - see $\textcolor{red}{\text{…to be added soon}}$
FitPhase: Batch-fit workflows for scikit-learn models - see $\textcolor{red}{\text{…to be added soon}}$