Knowledge layer: unified search across all sources

When you search in Thesis, you’re not searching just one place. The knowledge layer fans out across two branches simultaneously: your own committed research graph and a curated set of external sources, academic papers, GitHub repositories, documentation sites, and datasets. Results from both branches are deduplicated, scored, and merged before they reach you, so you get a single ranked list instead of two separate piles to reconcile.

Two branches, one query

Graph as knowledge base. Your committed research nodes are embedded and indexed automatically. When you search, Thesis retrieves chunks from your own prior work, hypotheses you’ve stated, insights you’ve recorded, summaries you’ve written, ranked by semantic similarity to your query.
External sources via Nia. Papers indexed from arXiv, repositories from GitHub, documentation sites, and datasets from HuggingFace are all searchable through the Nia proxy. You can index a source explicitly, or search across sources already available in the global Nia index.

The router fans out to both branches in parallel, applies per-user access control to graph results, deduplicates results by content hash, re-ranks by cosine similarity and relevance score, and returns a consistent result shape, regardless of whether a hit came from your graph or an external source.

Access control applies to graph results: you only see chunks from nodes you own or that have been shared with you. External source results follow Nia’s tenancy model, scoped to your organization’s API key.

Indexable source types

Before you can search an external source in depth, you index it. Thesis supports four source types:

Papers
Repositories
Documentation sites
Datasets

Index arXiv papers by arXiv ID. Once indexed, Thesis embeds the paper’s sections as chunks and makes them searchable alongside your graph content. Use thesis_index_paper from the MCP tools or the agent to index a paper.

Index a GitHub repository by owner and repo name. Code files, READMEs, and documentation are chunked and indexed. Use thesis_index_repo. Requires a GitHub token configured in your integration settings.

Index a documentation website by URL. Useful for indexing library docs, API references, or any web-accessible knowledge base. Use thesis_index_docs.

Index a HuggingFace dataset by dataset identifier. Dataset cards and metadata are indexed as searchable chunks. Use thesis_index_dataset.

Once indexed, you can also read individual files from a source (thesis_read_source), run regex searches across source content (thesis_grep_source), and browse the source file tree (thesis_explore_source).

Research modes

Beyond basic search, Thesis offers three research tools for deeper synthesis:

Oracle jobs

Oracle jobs answer a research question by querying indexed sources and synthesizing a response with citations. You choose the depth:

Quick, fast synthesis from top search hits, suitable for a first-pass answer.
Deep, broader retrieval with more source coverage, takes longer.
Oracle, maximum depth synthesis, designed for comprehensive literature questions.

Start a job with thesis_oracle_start_job. Results land as a structured response with citations and can be attached to a node as an artifact.

Tracer

Tracer runs a GitHub code search across one or more repositories and returns a structured brief: which files, functions, or patterns match your query, and how they relate. Use Tracer when your question is about implementation, how a method is used in practice, where a concept appears in a codebase, or what dependencies a library relies on. Start with thesis_tracer_search.

Deep Research

Deep Research is a separate pipeline backed by Exa. It queries Exa for relevant papers and web sources, fetches their full contents, and asks the Thesis agent to synthesize a cited markdown report. The report is saved as a file in your project volume and can be surfaced in the chat or attached to a node. Use Deep Research when you need broad web and paper discovery on a topic rather than point queries against indexed sources.

Use unified search for fast retrieval against sources you’ve already indexed. Use Oracle jobs when you want a synthesized answer with citations. Use Deep Research when you want to discover new sources you haven’t indexed yet.

How the graph becomes searchable

Thesis indexes your committed nodes automatically in the background. When you commit a node, the indexer picks it up within seconds, embeds the content using a 1536-dimension embedding model, and stores the chunks in the vector index. From that point forward, your own findings are retrievable through unified search, the agent can find relevant prior work in your graph just by searching, without needing to read every node manually. Staged nodes are not indexed. Only committed nodes enter the knowledge base. This keeps the retrievable layer stable and authoritative, it reflects what you’ve confirmed, not what’s still in progress.

What you can do after finding a source

Once you’ve found relevant sources through search, you have several options:

Read a source file

Use thesis_read_source to read a specific file or section from an indexed source, useful for reading the method section of a paper or a specific module in a repository.

Grep for patterns

Use thesis_grep_source to run a regex search within a source. Useful for finding specific function names, argument patterns, or terminology in large codebases or documents.

Browse the source tree

Use thesis_explore_source to navigate the file structure of an indexed source, helpful when you want to understand how a repository or documentation site is organized before reading specific files.

Ask a targeted question

Use thesis_document_query to ask a natural-language question against a single indexed source. This is document-scoped Q&A: the answer is grounded in that specific source, not the broader index.

Attach findings to nodes

Ask the agent to turn search results or Oracle reports into graph nodes, extracting claims, methods, and assumptions into structured insight or empirical nodes that become part of your durable research record.

Start

Concepts

Context

Guides

Knowledge layer: unified search across all sources

Two branches, one query

Indexable source types

Research modes

How the graph becomes searchable

What you can do after finding a source

Start

Concepts

Context

Guides

​Two branches, one query

​Indexable source types

​Research modes

​How the graph becomes searchable

​What you can do after finding a source

Two branches, one query

Indexable source types

Research modes

How the graph becomes searchable

What you can do after finding a source