{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": [
    "# How to: Use Callbacks\n",
    "\n",
    "Callbacks let you inject custom logic at well-defined points in a phase lifecycle\n",
    "without subclassing the phase itself. They are attached to a `TrainPhase`,\n",
    "`EvalPhase`, or `FitPhase` and are called automatically at each lifecycle boundary.\n",
    "\n",
    "> **Note:** This notebook covers **phase-level** callbacks (`Callback` subclasses)\n",
    "> that fire at batch/epoch boundaries inside a single phase. Experiment-level\n",
    "> callbacks (`ExperimentCallback`) that fire at phase/group boundaries are covered\n",
    "> in {doc}`05_create_experiment`.\n",
    "\n",
    "This notebook covers:\n",
    "\n",
    "- {ref}`08-callbacks-lifecycle`\n",
    "- {ref}`08-callbacks-custom`\n",
    "- {ref}`08-callbacks-evaluation`\n",
    "- {ref}`08-callbacks-eval-loss-metric`\n",
    "- {ref}`08-callbacks-accessing-results`\n",
    "- {ref}`08-callbacks-artifact-result`\n",
    "- {ref}`08-callbacks-summary`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1",
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import numpy as np\n",
    "\n",
    "from modularml import (\n",
    "    AppliedLoss,\n",
    "    EvalPhase,\n",
    "    Experiment,\n",
    "    FeatureSet,\n",
    "    Loss,\n",
    "    ModelGraph,\n",
    "    ModelNode,\n",
    "    Optimizer,\n",
    "    TrainPhase,\n",
    ")\n",
    "from modularml.models.torch import SequentialMLP\n",
    "from modularml.samplers import SimpleSampler"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2",
   "metadata": {},
   "source": [
    "## Experiment Setup\n",
    "\n",
    "We set up a small synthetic experiment used throughout this notebook.\n",
    "The setup follows the pattern from {doc}`05_create_experiment`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3",
   "metadata": {},
   "outputs": [],
   "source": [
    "rng = np.random.default_rng(42)\n",
    "\n",
    "# Synthetic data: 500 samples, 20-d feature, 1-d target\n",
    "fs = FeatureSet.from_dict(\n",
    "    label=\"SensorData\",\n",
    "    data={\n",
    "        \"voltage\": list(rng.standard_normal((500, 20))),\n",
    "        \"soh\": list(rng.standard_normal((500, 1))),\n",
    "    },\n",
    "    feature_keys=\"voltage\",\n",
    "    target_keys=\"soh\",\n",
    ")\n",
    "fs.split_random(ratios={\"train\": 0.7, \"val\": 0.15, \"test\": 0.15}, seed=13)\n",
    "fs_ref = fs.reference(features=\"voltage\", targets=\"soh\")\n",
    "\n",
    "mn_mlp = ModelNode(\n",
    "    label=\"MLP\",\n",
    "    model=SequentialMLP(output_shape=(1, 1), n_layers=2, hidden_dim=16),\n",
    "    upstream_ref=fs_ref,\n",
    ")\n",
    "graph = ModelGraph(\n",
    "    label=\"SimpleGraph\",\n",
    "    nodes=[mn_mlp],\n",
    "    optimizer=Optimizer(\"adam\", opt_kwargs={\"lr\": 1e-3}, backend=\"torch\"),\n",
    ")\n",
    "graph.build()\n",
    "\n",
    "exp = Experiment.from_active_context(label=\"my_experiment\")\n",
    "\n",
    "mse_loss = AppliedLoss(\n",
    "    loss=Loss(\"mse\", backend=\"torch\"),\n",
    "    on=\"MLP\",\n",
    "    inputs=[\"outputs\", \"targets\"],\n",
    ")\n",
    "print(f\"Splits: {fs.available_splits}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5",
   "metadata": {},
   "source": [
    "(08-callbacks-lifecycle)=\n",
    "## The Callback Lifecycle\n",
    "\n",
    "Every `Callback` subclass can override up to seven hook methods. They are called\n",
    "automatically by the phase in the order shown below:\n",
    "\n",
    "```\n",
    "on_phase_start\n",
    "  for each epoch:\n",
    "    on_epoch_start\n",
    "      for each batch:\n",
    "        on_batch_start\n",
    "        [forward / backward pass]\n",
    "        on_batch_end\n",
    "    on_epoch_end\n",
    "on_phase_end\n",
    "\n",
    "on_exception  # called if any hook or step raises\n",
    "```\n",
    "\n",
    "| Hook | Signature | When it fires |\n",
    "|------|-----------|---------------|\n",
    "| `on_phase_start` | `(experiment, phase, results)` | Once before the first epoch |\n",
    "| `on_phase_end` | `(experiment, phase, results)` | Once after the last epoch |\n",
    "| `on_epoch_start` | `(experiment, phase, exec_ctx, results)` | Before each epoch's first batch |\n",
    "| `on_epoch_end` | `(experiment, phase, exec_ctx, results)` | After each epoch's last batch |\n",
    "| `on_batch_start` | `(experiment, phase, exec_ctx, results)` | Before each batch |\n",
    "| `on_batch_end` | `(experiment, phase, exec_ctx, results)` | After each batch |\n",
    "| `on_exception` | `(experiment, phase, exec_ctx, exception, results)` | On any unhandled exception |\n",
    "\n",
    "Each hook may return any value (or `None`). Non-`None` return values are automatically\n",
    "wrapped in a `CallbackResult` and stored in the phase's `PhaseResults` container,\n",
    "keyed by the callback's label."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7",
   "metadata": {},
   "source": [
    "(08-callbacks-custom)=\n",
    "## Writing a Custom Callback\n",
    "\n",
    "Subclass `Callback` and override only the hooks you need. The only abstract\n",
    "requirement is implementing `get_config()` and `from_config()` for serialization.\n",
    "\n",
    "The example below prints the mean training loss at the end of every epoch."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8",
   "metadata": {},
   "outputs": [],
   "source": [
    "from typing import Any\n",
    "\n",
    "from modularml.core.experiment.callbacks.callback import Callback\n",
    "\n",
    "\n",
    "class LossLogger(Callback):\n",
    "    \"\"\"Prints the mean training loss at the end of each epoch.\"\"\"\n",
    "\n",
    "    def __init__(self, node: str = \"MLP\", label: str | None = None):\n",
    "        super().__init__(label=label or \"LossLogger\")\n",
    "        self.node = node\n",
    "        self._epoch_losses: list[float] = []\n",
    "\n",
    "    def on_epoch_end(self, *, experiment, phase, exec_ctx, results=None):\n",
    "        epoch_idx = exec_ctx.epoch_idx\n",
    "        # Use the results provided to this hook\n",
    "        if results is not None:\n",
    "            # Grab the recorded losses for this epoch\n",
    "            epoch_losses = results.losses(node=self.node).where(epoch=epoch_idx)\n",
    "\n",
    "            # Since we likely have a loss recorded for each batch in the epoch,\n",
    "            # we can collapse losses via averaging\n",
    "            if len(epoch_losses.values()) > 0:\n",
    "                collapsed = epoch_losses.collapse(axis=\"batch\", reducer=\"mean\").one()\n",
    "                # Grab the trainable component of this loss\n",
    "                loss_val = collapsed.trainable\n",
    "                self._epoch_losses.append(loss_val)\n",
    "                print(f\"  epoch {epoch_idx:>3}: train_loss = {loss_val:.4f}\")\n",
    "\n",
    "    def get_config(self) -> dict[str, Any]:\n",
    "        return {\"callback_type\": self.__class__.__name__, \"node\": self.node, \"label\": self.label}\n",
    "\n",
    "    @classmethod\n",
    "    def from_config(cls, config: dict) -> \"LossLogger\":\n",
    "        return cls(node=config[\"node\"], label=config[\"label\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9",
   "metadata": {},
   "source": [
    "Callbacks are registered on a phase via `add_callback()` or at construction time\n",
    "via the `callbacks=` parameter."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10",
   "metadata": {},
   "outputs": [],
   "source": [
    "loss_logger = LossLogger(node=\"MLP\")\n",
    "\n",
    "train_phase = TrainPhase.from_split(\n",
    "    label=\"train\",\n",
    "    split=\"train\",\n",
    "    sampler=SimpleSampler(batch_size=32, shuffle=True, seed=42),\n",
    "    losses=[mse_loss],\n",
    "    n_epochs=3,\n",
    "    callbacks=[loss_logger],\n",
    ")\n",
    "print(f\"Attached callbacks: {[cb.label for cb in train_phase.callbacks]}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11",
   "metadata": {},
   "outputs": [],
   "source": [
    "train_results = exp.run_phase(train_phase)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\"\\nRecorded epoch losses: {loss_logger._epoch_losses}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "13",
   "metadata": {},
   "source": [
    "### `ExecutionContext`\n",
    "\n",
    "In addition to the current phase results, epoch- and batch-level hooks\n",
    "receive an `ExecutionContext` (`exec_ctx`) with the following fields:\n",
    "\n",
    "| Field | Type | Description |\n",
    "|-------|------|-------------|\n",
    "| `phase_label` | `str` | Label of the currently executing phase |\n",
    "| `epoch_idx` | `int \\| None` | Zero-based epoch index (`None` in `EvalPhase`) |\n",
    "| `batch_idx` | `int` | Zero-based batch index within the epoch |\n",
    "| `inputs` | `dict` | Pre-materialized input batches for all head nodes |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "14",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "15",
   "metadata": {},
   "source": [
    "(08-callbacks-evaluation)=\n",
    "## The `Evaluation` Built-in Callback\n",
    "\n",
    "`Evaluation` runs a full `EvalPhase` at configurable epoch intervals and stores\n",
    "the resulting `EvalResults` as an `EvaluationCallbackResult`. It is the primary\n",
    "tool for tracking held-out performance during training.\n",
    "\n",
    "```python\n",
    "    Evaluation(\n",
    "        eval_phase: EvalPhase,\n",
    "        every_n_epochs: int = 1,\n",
    "        run_on_start: bool = False,\n",
    "        label: str | None = None,\n",
    "        metrics: list[EvaluationMetric] | None = None,\n",
    "    )\n",
    "```\n",
    "\n",
    "| Parameter | Type | Default | Description |\n",
    "|-----------|------|---------|-------------|\n",
    "| `eval_phase` | `EvalPhase` | (required) | The evaluation phase to run. |\n",
    "| `every_n_epochs` | `int` | `1` | Run evaluation every N completed epochs. |\n",
    "| `run_on_start` | `bool` | `False` | If `True`, also evaluate before epoch 0. |\n",
    "| `label` | `str \\| None` | `None` | Result key (defaults to `eval_phase.label`). |\n",
    "| `metrics` | `list[EvaluationMetric]` | `None` | Metric extractors to run after each evaluation. |\n",
    "\n",
    "A convenience constructor `Evaluation.from_split()` creates the inner `EvalPhase`\n",
    "automatically:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "16",
   "metadata": {},
   "outputs": [],
   "source": [
    "from modularml.callbacks import Evaluation\n",
    "\n",
    "eval_cb = Evaluation.from_split(\n",
    "    label=\"eval_val\",\n",
    "    split=\"val\",\n",
    "    every_n_epochs=1,\n",
    ")\n",
    "\n",
    "train_phase_with_eval = TrainPhase.from_split(\n",
    "    label=\"train_with_eval\",\n",
    "    split=\"train\",\n",
    "    sampler=SimpleSampler(batch_size=32, shuffle=True, seed=42),\n",
    "    losses=[mse_loss],\n",
    "    n_epochs=3,\n",
    "    callbacks=[eval_cb],\n",
    ")\n",
    "\n",
    "train_results = exp.run_phase(train_phase_with_eval)\n",
    "print(\"Training with Evaluation callback complete.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "17",
   "metadata": {},
   "source": [
    "### How `Evaluation` Works\n",
    "\n",
    "`Evaluation` fires in `on_epoch_end`. At each selected epoch it calls\n",
    "`experiment.preview_phase(phase=self.eval_phase)`, a stateless forward pass\n",
    "that does **not** mutate experiment history or optimizer state.\n",
    "\n",
    "The evaluation result is wrapped in an `EvaluationCallbackResult` and stored\n",
    "in the `TrainResults` of the parent `TrainPhase`, keyed by the callback label."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "18",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(train_results.callbacks())\n",
    "\n",
    "eval_cbs = train_results.callbacks(kind=\"evaluation\").values()\n",
    "print(eval_cbs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "19",
   "metadata": {},
   "outputs": [],
   "source": [
    "# And the eval results in those callbacks can be accessed with\n",
    "[eval_cb.eval_results for eval_cb in eval_cbs]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "20",
   "metadata": {},
   "source": [
    "While running evaluation on some set of data during callbacks can be useful,\n",
    "a more common case is to record metrics on an evaluation forward pass.\n",
    "\n",
    "This is where `EvaluationMetric` and `EvalLossMetric` come in.\n",
    "It is equivalent to defining an `Evaluation` callback with an attached loss; however,\n",
    "`MetricCallback`s provide more-convenient access to scalar values produced by callback hooks."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "21",
   "metadata": {},
   "source": [
    "(08-callbacks-eval-loss-metric)=\n",
    "## The `EvalLossMetric` Built-in Metric\n",
    "\n",
    "`EvalLossMetric` is an `EvaluationMetric` that extracts a scalar loss value\n",
    "from an `Evaluation` result and logs it to the `MetricStore` under a chosen name\n",
    "(default: `\"val_loss\"`). Pass it to `Evaluation` via the `metrics=` argument.\n",
    "\n",
    "```python\n",
    "    EvalLossMetric(\n",
    "        loss: AppliedLoss,\n",
    "        reducer: Literal[\"sum\", \"mean\"] = \"mean\",\n",
    "        name: str = \"val_loss\",\n",
    "    )\n",
    "```\n",
    "\n",
    "| Parameter | Type | Default | Description |\n",
    "|-----------|------|---------|-------------|\n",
    "| `loss` | `AppliedLoss` | (required) | Applied loss to track. Appended to the inner `EvalPhase` automatically if not already present. |\n",
    "| `reducer` | `str` | `\"mean\"` | How to aggregate per-batch losses into a scalar. |\n",
    "| `name` | `str` | `\"val_loss\"` | Metric name logged to the `MetricStore`. |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "22",
   "metadata": {},
   "outputs": [],
   "source": [
    "from modularml.callbacks import EvalLossMetric, Evaluation\n",
    "\n",
    "# We define a loss to be applied during our Evaluation callback\n",
    "val_loss_metric = EvalLossMetric(\n",
    "    loss=AppliedLoss(\n",
    "        loss=Loss(\"mse\", backend=\"torch\"),\n",
    "        on=\"MLP\",\n",
    "        inputs=[\"targets\", \"outputs\"],\n",
    "    ),\n",
    "    reducer=\"mean\",\n",
    "    name=\"val_loss\",\n",
    ")\n",
    "# ^ note that metrics with name \"val_loss\" will be shown in the progress bars during training\n",
    "\n",
    "eval_cb_with_metric = Evaluation.from_split(\n",
    "    label=\"eval_val\",\n",
    "    split=\"val\",\n",
    "    every_n_epochs=1,\n",
    "    metrics=[val_loss_metric],\n",
    ")\n",
    "\n",
    "train_phase_tracked = TrainPhase.from_split(\n",
    "    label=\"train_tracked\",\n",
    "    split=\"train\",\n",
    "    sampler=SimpleSampler(batch_size=32, shuffle=True, seed=42),\n",
    "    losses=[mse_loss],\n",
    "    n_epochs=5,\n",
    "    callbacks=[eval_cb_with_metric],\n",
    ")\n",
    "\n",
    "train_results = exp.run_phase(train_phase_tracked)\n",
    "print(\"Training complete.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "23",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "24",
   "metadata": {},
   "source": [
    "(08-callbacks-accessing-results)=\n",
    "## Accessing Callback Results\n",
    "\n",
    "After training, callback results are stored in the parent phase under a `callbacks` attribute.\n",
    "\n",
    "Metrics are available under the `metrics` attribute."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "25",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Here we grab the training results directly from the experiment's history\n",
    "last_train_results = exp.last_run.results\n",
    "\n",
    "# All result attributes return AxisSeries; it's essentially a queriable, multi-keyed dict\n",
    "# More on those in a later notebook\n",
    "val_losses = last_train_results.metrics().where(name=\"val_loss\").values()\n",
    "print(\"val_loss per epoch:\")\n",
    "for entry in val_losses:\n",
    "    print(f\"  epoch {entry.epoch_idx}: {entry.value:.4f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26",
   "metadata": {},
   "source": [
    "Full Evaluation results are a little more annoying to access as all values returned by a callback hook are wrapped in a CallbackResult container.\n",
    "\n",
    "For callbacks that run an EvalPhase, our data structure is along the lines of:\n",
    "\n",
    "> CallbackResults --> EvalResults --> [tensors, losses, metrics]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "27",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(last_train_results.callbacks().axes)\n",
    "\n",
    "print(\"Axis values:\")\n",
    "for k, vs in last_train_results.callbacks().axes_values().items():\n",
    "    print(f\"  {k}: {vs}\")\n",
    "\n",
    "# Grab the underlying EvalResults from epoch 3\n",
    "cb_res = last_train_results.callbacks(kind=\"evaluation\").where(epoch=3).one()\n",
    "print(cb_res.eval_results)\n",
    "\n",
    "# Output tensors accessed directly\n",
    "pred = cb_res.stacked_tensors(node=\"MLP\", domain=\"outputs\", fmt=\"np\").reshape(-1)\n",
    "true = cb_res.stacked_tensors(node=\"MLP\", domain=\"targets\", fmt=\"np\").reshape(-1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "28",
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "\n",
    "plt.figure(figsize=(3,3))\n",
    "plt.scatter(pred, true)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29",
   "metadata": {},
   "source": [
    "*Pretty terrible results :(*\n",
    "\n",
    "But at least we can run callbacks."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "30",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "31",
   "metadata": {},
   "source": [
    "(08-callbacks-artifact-result)=\n",
    "## The `ArtifactResult` - logging rich non-scalar objects\n",
    "\n",
    "`MetricResult` is restricted to scalar return types.\n",
    "For richer objects (e.g., matplotlib figures, pandas DataFrames, numpy arrays, text), wrap your custom callback returns in an `ArtifactResult`.\n",
    "\n",
    "```python\n",
    "    ArtifactResult(\n",
    "        artifact_name: str,   # stable name, e.g. \"val_scatter\"\n",
    "        artifact: Any,        # any Python object\n",
    "    )\n",
    "```\n",
    "\n",
    "When an `ArtifactResult` is returned, the framework:\n",
    "1. Stores the result in `PhaseResults._callbacks` (accessible via `callbacks(kind=\"artifact\")`)\n",
    "2. **Also** stores it in `PhaseResults._artifacts` (accessible via `results.artifacts()`)\n",
    "\n",
    "This provides more convenient access to artifacts, similar to the access path of metrics.\n",
    "\n",
    "If the `Experiment` was created with a `ResultsConfig(results_dir=...)`, the artifact is\n",
    "serialized to disk automatically. `entry.artifact` transparently deserializes it on access."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "32",
   "metadata": {},
   "source": [
    "Let's create a custom callback that produces matplotlib figures at the end of each epoch."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "33",
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "\n",
    "from modularml.callbacks import ArtifactResult\n",
    "\n",
    "\n",
    "class ScatterPlotCallback(Callback):\n",
    "    \"\"\"Produces a pred-vs-true scatter plot at the end of every epoch.\"\"\"\n",
    "\n",
    "    def __init__(self, node: str = \"MLP\", label: str | None = None):\n",
    "        super().__init__(label=label or \"ScatterPlot\")\n",
    "        self.node = node\n",
    "\n",
    "    def on_epoch_end(self, *, experiment, phase, exec_ctx, results=None):\n",
    "        # Only run every other epoch to keep the demo quick\n",
    "        if exec_ctx.epoch_idx % 2 != 0:\n",
    "            return None\n",
    "\n",
    "        # Pull predictions from the most recent eval callback result\n",
    "        if results is None:\n",
    "            return None\n",
    "        eval_cbs = results.callbacks(kind=\"evaluation\").values()\n",
    "        if not eval_cbs:\n",
    "            return None\n",
    "        last_eval = eval_cbs[-1]\n",
    "\n",
    "        pred = last_eval.stacked_tensors(node=self.node, domain=\"outputs\", fmt=\"np\").reshape(-1)\n",
    "        true = last_eval.stacked_tensors(node=self.node, domain=\"targets\", fmt=\"np\").reshape(-1)\n",
    "\n",
    "        fig, ax = plt.subplots(figsize=(3, 3))\n",
    "        ax.scatter(pred, true, alpha=0.4, s=10)\n",
    "        ax.set_xlabel(\"Predicted\")\n",
    "        ax.set_ylabel(\"True\")\n",
    "        ax.set_title(f\"Epoch {exec_ctx.epoch_idx}\")\n",
    "        plt.tight_layout()\n",
    "        plt.close()\n",
    "\n",
    "        return ArtifactResult(artifact_name=\"val_scatter\", artifact=fig)\n",
    "\n",
    "    def get_config(self):\n",
    "        return {\"callback_type\": self.__class__.__name__, \"node\": self.node, \"label\": self.label}\n",
    "\n",
    "    @classmethod\n",
    "    def from_config(cls, config):\n",
    "        return cls(node=config[\"node\"], label=config[\"label\"])\n",
    "\n",
    "\n",
    "scatter_cb = ScatterPlotCallback(node=\"MLP\")\n",
    "eval_cb_for_scatter = Evaluation.from_split(\n",
    "    label=\"eval_val\",\n",
    "    split=\"val\",\n",
    "    every_n_epochs=1,\n",
    ")\n",
    "\n",
    "train_phase_with_artifacts = TrainPhase.from_split(\n",
    "    label=\"train_with_artifacts\",\n",
    "    split=\"train\",\n",
    "    sampler=SimpleSampler(batch_size=32, shuffle=True, seed=42),\n",
    "    losses=[mse_loss],\n",
    "    n_epochs=4,\n",
    "    callbacks=[eval_cb_for_scatter, scatter_cb],\n",
    ")\n",
    "\n",
    "train_results = exp.run_phase(train_phase_with_artifacts)\n",
    "print(\"Artifact names:\", train_results.artifact_names())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "34",
   "metadata": {},
   "outputs": [],
   "source": [
    "# List all artifact names produced during the phase\n",
    "print(\"Artifact names:\", train_results.artifact_names())\n",
    "\n",
    "# Query via the artifact store; keyed by (name, epoch, batch)\n",
    "scatter_series = train_results.artifacts().where(name=\"val_scatter\")\n",
    "print(f\"Scatter plots recorded: {len(scatter_series)}\")\n",
    "for entry in scatter_series.values():\n",
    "    print(f\"  epoch {entry.epoch_idx}: {type(entry.artifact).__name__}\")\n",
    "\n",
    "# entry.artifact transparently loads from disk when ResultsConfig(results_dir=...) is set\n",
    "entry = scatter_series.where(epoch=0).one()\n",
    "entry.artifact"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "35",
   "metadata": {},
   "source": [
    "(08-callbacks-summary)=\n",
    "## Summary\n",
    "\n",
    "### `Callback` Base Class\n",
    "\n",
    "| Method | Override to... |\n",
    "|--------|----------------|\n",
    "| `on_phase_start(experiment, phase, results)` | Run once before the first epoch. |\n",
    "| `on_phase_end(experiment, phase, results)` | Run once after the last epoch. |\n",
    "| `on_epoch_start(experiment, phase, exec_ctx, results)` | Run before each epoch. |\n",
    "| `on_epoch_end(experiment, phase, exec_ctx, results)` | Run after each epoch. |\n",
    "| `on_batch_start(experiment, phase, exec_ctx, results)` | Run before each batch. |\n",
    "| `on_batch_end(experiment, phase, exec_ctx, results)` | Run after each batch. |\n",
    "| `on_exception(experiment, phase, exec_ctx, exception, results)` | Run on unhandled exception. |\n",
    "| `get_config()` | Serialize callback config (required). |\n",
    "| `from_config(config)` | Reconstruct from config (required). |\n",
    "\n",
    "### `Evaluation` Callback\n",
    "\n",
    "| Parameter | Type | Default | Description |\n",
    "|-----------|------|---------|-------------|\n",
    "| `eval_phase` | `EvalPhase` | (required) | Phase to run on selected epochs. |\n",
    "| `every_n_epochs` | `int` | `1` | Evaluation frequency. |\n",
    "| `run_on_start` | `bool` | `False` | Evaluate before training begins. |\n",
    "| `label` | `str \\| None` | `None` | Result key (defaults to phase label). |\n",
    "| `metrics` | `list[EvaluationMetric]` | `None` | Metric extractors run after each evaluation. |\n",
    "\n",
    "Convenience constructor: `Evaluation.from_split(label, split, losses, every_n_epochs, metrics, ...)`\n",
    "\n",
    "### `EvalLossMetric`\n",
    "\n",
    "| Parameter | Type | Default | Description |\n",
    "|-----------|------|---------|-------------|\n",
    "| `loss` | `AppliedLoss` | (required) | Applied loss to track. |\n",
    "| `reducer` | `str` | `\"mean\"` | Batch aggregation method. |\n",
    "| `name` | `str` | `\"val_loss\"` | Metric name in the `MetricStore`. |\n",
    "\n",
    "### `ArtifactResult`\n",
    "\n",
    "Return an `ArtifactResult` from any callback hook to log a rich non-scalar object\n",
    "(matplotlib figure, DataFrame, array, text, …).\n",
    "\n",
    "| Field | Type | Description |\n",
    "|-------|------|-------------|\n",
    "| `artifact_name` | `str` | Stable name for querying (e.g. `\"val_scatter\"`). |\n",
    "| `artifact` | `Any` | The object to store. |\n",
    "\n",
    "### Accessing Results\n",
    "\n",
    "| Expression | Returns | Notes |\n",
    "|------------|---------|-------|\n",
    "| `results.metrics().where(name=\"val_loss\")` | `AxisSeries[MetricEntry]` | Scalar metrics keyed by `(name, epoch, batch)`. |\n",
    "| `results.callbacks()` | `CallbackDataSeries` | All callback results keyed by `(kind, label, epoch, batch, edge)`. |\n",
    "| `results.callbacks(kind=\"evaluation\")` | `AxisSeries[EvaluationCallbackResult]` | Type-narrowed to evaluation results. |\n",
    "| `results.callbacks(kind=\"metric\")` | `AxisSeries[MetricResult]` | Type-narrowed to metric results. |\n",
    "| `results.callbacks(kind=\"artifact\")` | `AxisSeries[ArtifactResult]` | Type-narrowed to artifact results. |\n",
    "| `results.callbacks(kind=\"payload\")` | `AxisSeries[PayloadResult]` | Type-narrowed to arbitrary payload results. |\n",
    "| `results.artifacts()` | `ArtifactDataSeries` | Artifact entries keyed by `(name, epoch, batch)`. |\n",
    "| `results.artifact_names()` | `list[str]` | All unique artifact names recorded. |\n",
    "| `entry.artifact` | `Any` | The artifact object; auto-loads from disk if serialized via `ResultsConfig`. |\n",
    "| `cb.stacked_tensors(node, domain, fmt)` | `np.ndarray \\| Tensor` | Concatenated tensors across all eval batches. |\n",
    "| `cb.aggregated_losses(node, reducer)` | `dict[str, float]` | Per-loss scalar dict. |\n",
    "\n",
    "### Results Storage\n",
    "\n",
    "`ResultsConfig` controls where artifacts and execution contexts are stored (RAM vs disk).\n",
    "See {doc}`05_create_experiment` for the full `ResultsConfig` and `ResultRecording` reference.\n",
    "\n",
    "### Next Steps\n",
    "\n",
    "- **Cross-validation:** Use `Evaluation` and `EvalLossMetric` together with\n",
    "  `CrossValidation` and `CVBinding`; see {doc}`09_use_cross_validation`"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv (3.13.5)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}