Design and stage a Thesis empirical experiment node

Before any code runs on GPU, Thesis requires a structured experiment blueprint that makes the hypothesis falsifiable, the method reproducible, and the success criteria explicit. This guide walks through staging an empirical node, reviewing the proposal card, and understanding what happens when you approve or reject it.

Node kinds in Thesis

Thesis has three node kinds:

Kind	When to use
`untyped`	Planning nodes, literature summaries, root anchors
`empirical`	Testable experiments with a hypothesis and a method
`insight`	Synthesis nodes that summarize conclusions across experiments

An empirical node is the only kind that can receive a compute grant. It carries a blueprint.json artifact with an 8-section experiment design and a status field that progresses from drafted → awaiting_approval → in_progress → completed (or failed).

The 8-section blueprint

When the agent stages an empirical node, it fills eight free-text sections:

Section	What belongs here
`hypothesis_restatement`	A single falsifiable claim derived from the parent node
`falsification_criterion`	The exact condition that would disprove the hypothesis
`method_section`	Narrative description of the procedure; the independent variable lives here
`expected_outcomes_section`	What you expect to observe if the hypothesis is correct
`baseline_control_section`	The control condition or reference result to compare against
`metrics_section`	Primary, secondary, and tertiary metrics to record
`data_config_section`	Dataset paths, sampling strategy, random seeds
`rigor_review_section`	A PASS or FAIL verdict against research standards, with reasoning

The rigor review is the agent’s self-critique pass. A FAIL there does not block the proposal, but it surfaces concerns for you to evaluate before approving.

Ask the agent to stage an empirical node

In the agent chat, describe the experiment you want designed. The agent will call thesis_stage_node_create with kind: "empirical" and produce a blueprint.json artifact on the new node.You can also call the MCP tool directly to create a minimal empirical node, then let the agent fill in the blueprint:

{
  "name": "thesis_stage_node_create",
  "arguments": {
    "title": "Cosine vs. linear LR decay, 1B GRPO convergence",
    "summary": "Test whether cosine decay reaches 90% of peak reward 20% faster than linear decay under identical GRPO hyperparameters.",
    "kind": "empirical",
    "parent_ids": ["<root-node-id>"]
  }
}

The agent then populates the full blueprint, writes it as artifacts/blueprint.json, and sets the node status to awaiting_approval.

The main agent does not run arbitrary code. Staging an empirical node only creates the blueprint, no compute is acquired at this step.

Review the proposal card in the UI

Once the agent emits a proposal event, a blueprint card appears in the agent chat. Switch to the Agents tab in the right pane to see all staged experiments in the drafted or awaiting_approval state.On the card, review each section of the blueprint. Pay particular attention to:

Falsification criterion, is it specific enough to produce a clear result?
Metrics, are primary metrics measurable without ambiguity?
Rigor review, if the agent marked FAIL, read its reasoning before approving.

The card also shows the proposed compute configuration: GPU SKU, provider, timeout, and budget.

Approve or reject the proposal

Click Run on the proposal card to approve, or Dismiss to reject.If you approve:The backend mints a short-lived approval token and injects a synthesized message into the chat. The agent receives the token, calls spawn_experiment_agent, and the compute acquisition flow begins. See Run a sub-agent on an approved experiment for the full execution flow.If you reject:The rejection is logged as a dead end in the research log, a permanent record that this direction was considered and declined, with a timestamp. The node remains in the graph so future work can trace the reasoning, but no compute is spent. You can ask the agent to revise the blueprint and re-propose.

Approval is irreversible in the short term. Once the approval token is consumed and a compute lease is acquired, the sub-agent begins executing. Use Dismiss if you want to revise the blueprint first.

What the approval mints

When you approve a proposal, Thesis writes two artifacts to the node:

compute_approval.json, the approval receipt, including the token, SKU, provider, budget, and timeout.
A proposal_approved entry in the research log, timestamped and tied to your user ID.

The approval token is the only thing that allows spawn_experiment_agent to proceed. A prompt alone cannot start compute, the token must be present and valid.

Start

Concepts

Context

Guides

Design and stage a Thesis empirical experiment node

Node kinds in Thesis

The 8-section blueprint

What the approval mints

Start

Concepts

Context

Guides

​Node kinds in Thesis

​The 8-section blueprint

​What the approval mints

Node kinds in Thesis

The 8-section blueprint

What the approval mints