Graph Embeddings

Terraphim uses a fundamentally different approach to semantic search compared to traditional vector embeddings. Instead of dense numerical vectors, Terraphim leverages graph structure embeddings where ranking is determined by the number of synonyms and related concepts connected to a query term in the knowledge graph.

What are Graph Embeddings in Terraphim?

Unlike vector embeddings that represent concepts as points in a high-dimensional semantic space, Terraphim represents concepts as nodes in a knowledge graph. Each node is a normalized term, and edges represent co-occurrence relationships between terms found in documents.

The key insight is that rank is defined by the number of synonyms connected to a concept. When you search for a term, Terraphim expands your query to include all synonyms and related concepts from the knowledge graph, then traverses the graph to find documents that mention these connected concepts.

Graph Structure

    "raft" ----(edge)---- "consensus"
       |                    |
    (edge)               (edge)
       |                    |
    "leader" ----(edge)---- "election"

When you search for "consensus algorithms", the graph traverses from the matched node to connected nodes, finding documents that mention related concepts like "raft", "leader election", and so on.

How It Works

The Terraphim Graph (scorer) uses unique graph embeddings with the following ranking algorithm:

total_rank = node.rank + edge.rank + document_rank

Where:

node.rank: Number of connections to other concepts in the graph
edge.rank: Number of documents containing both connected concepts
document.rank: Base ranking score of the document

Query Expansion

When you search, Terraphim:

Matches your query terms against the thesaurus (normalized terms and synonyms)
Expands the query to include all connected synonyms and related concepts
Traverses the graph to find documents containing these terms
Ranks documents by aggregating scores from multiple graph paths
Returns results with explainable match reasons

Technical Details

Core Implementation

The graph embedding system is implemented in crates/terraphim_rolegraph/src/lib.rs:

RoleGraph: The core data structure representing concepts and their relationships
TriggerIndex: TF-IDF fallback for semantic search when exact matches aren't found
Node/Edge: Graph primitives representing concepts and their connections

Symbolic Embeddings

For domain-specific embeddings (e.g., medical), the SymbolicEmbeddingIndex in crates/terraphim_rolegraph/src/medical.rs builds embeddings from IS-A hierarchies, allowing for hierarchical concept relationships.

Configuration

The system is configured via config/atomic_graph_embeddings_config.json:

{
  "roles": {
    "Atomic Graph Embeddings": {
      "relevance_function": "terraphim-graph"
    }
  }
}

The terraphim-graph relevance function enables graph-based ranking.

Use Cases

Semantic Search with Relationship Awareness

Graph embeddings excel when content relationships matter more than simple keyword matching:

Finding related concepts automatically: Search for "distributed systems" also returns results about "consensus", "raft", "CAP theorem"
Role-based search: Domain-specific knowledge graphs for engineers, medical professionals, etc.
Explainable results: Each result shows which graph paths led to the match

Example: Learning Assistant

With a learning knowledge graph:

Search "active recall" returns notes about "spaced repetition", "flashcards", "memory"
Search "consensus algorithms" returns notes about "raft", "paxos", "leader election"
Results are ranked by graph connectivity, not just keyword density

Comparison with Vector Embeddings

Why Graph Embeddings?

Feature	Vector Embeddings	Graph Embeddings
Representation	Dense vectors	Graph structure
Explainability	Black box	Full traceability
Queries expand	Implicit via distance	Explicit via synonyms
Relationship capture	Learns patterns	Encodes relationships
Domain adaptation	Requires retraining	Add to thesaurus

The Transparency Advantage

Unlike vector embeddings where you don't know WHY a document matched, Terraphim's graph embeddings show:

Which terms matched: The exact thesaurus entries that triggered
Graph paths: The path from query term through the graph to the document
Ranking breakdown: How node.rank + edge.rank + document_rank was computed

This makes results fully auditable and debuggable.

Example Usage

CLI Search

# Search with the Engineer role
terraphim-agent search "graph embeddings" --role engineer

This returns results for:

"graph embeddings" (exact match)
"terraphim-graph" (synonym)
"knowledge graph based embeddings" (related concept)
"symbolic embeddings" (connected concept)

Programmatic Usage

use terraphim_rolegraph::RoleGraph;
use terraphim_types::{Thesaurus, RoleName};

// Create rolegraph with domain knowledge
let thesaurus = build_domain_thesaurus();
let role_name = RoleName::new("Engineer");
let mut graph = RoleGraph::new(role_name, thesaurus).await?;

// Index documents
for doc in documents {
    graph.insert_document(&doc.id, doc);
}

// Query - automatically expands to synonyms
let results = graph.query_graph("distributed systems", None, Some(10))?;

Configuration

Enable graph embeddings in your config:

{
  "roles": {
    "Your Role": {
      "relevance_function": "terraphim-graph",
      "kg": {
        "knowledge_graph_local": {
          "path": "./docs/your-domain"
        }
      }
    }
  }
}

Next Steps

Quickstart Guide - Get started with Terraphim
Configuration Guide - Configure roles and knowledge graphs
Installation Guide - Install Terraphim on your system