Terraphim
v1.16.0
Terraphim
Technical Terraphim vector-search knowledge-graph semantic-search comparison

When it comes to semantic search, there are fundamentally different architectural approaches. Alibaba's zvec and Terraphim represent two distinct philosophies: neural embeddings vs. knowledge graphs, scale vs. interpretability, dense vectors vs. co-occurrence relationships.

The Core Philosophy

zvec: Neural Embeddings at Scale

zvec is a lightweight, in-process vector database built on Alibaba's battle-tested Proxima engine. It transforms documents into high-dimensional vectors using neural embedding models (BERT, OpenAI, etc.), then uses Approximate Nearest Neighbour (ANN) algorithms like HNSW to find similar documents.

Key Characteristics:

  • Dense vectors (typically 384-1536 dimensions)
  • ANN indexing (HNSW, IVF, Flat)
  • Built-in embedding models (OpenAI, Qwen, SentenceTransformers)
  • Billions of vectors, millisecond query times
  • Black-box interpretability

Terraphim: Knowledge Graphs for Understanding

Terraphim takes a radically different approach. Instead of converting documents to opaque vectors, it builds a knowledge graph from term co-occurrences. Each concept becomes a node, relationships become edges, and relevance is calculated by traversing this graph structure.

Key Characteristics:

  • Co-occurrence graph embeddings
  • Aho-Corasick automata for fast pattern matching
  • Domain-specific thesauri for synonym expansion
  • Role-based graphs for persona-driven search
  • Fully explainable relevance scoring

Architectural Comparison

+-------------------------------------------------------------+
|                         zvec                                 |
+-------------------------------------------------------------+
|  Document -> Neural Encoder -> Dense Vector -> HNSW Index    |
|                                                    |         |
|  Query -> Neural Encoder -> Query Vector -> ANN Search -> K  |
+-------------------------------------------------------------+
                              vs
+-------------------------------------------------------------+
|                       Terraphim                              |
+-------------------------------------------------------------+
|  Document -> Term Extraction -> Co-occurrence -> Graph       |
|                                                   |          |
|  Query -> Aho-Corasick Match -> Graph Traversal -> Ranked    |
+-------------------------------------------------------------+

Data Structures

ComponentzvecTerraphim
Storage UnitCollection (table-like)RoleGraph (knowledge graph)
Document IDStringString
RepresentationsDense/Sparse vectors (768-dim+)Nodes, Edges, Thesaurus
Index TypesHNSW, IVF, Flat, InvertedHash maps + Aho-Corasick
PersistenceDisk-based collectionsJSON serialisation

Query Semantics

zvec Query:

import zvec

# Semantic similarity via vector comparison
results = collection.query(
    zvec.VectorQuery("embedding", vector=[0.1, -0.3, ...]),
    topk=10,
    filter="category == 'tech'"
)
# Returns: documents with similar vectors (cosine similarity)

Terraphim Query:

// Graph traversal with term expansion
let results = role_graph.query_graph(
    "async programming",
    Some(0),  // offset
    Some(10)  // limit
);
// Returns: documents ranked by graph connectivity
// Matched nodes: "async", "programming", "concurrency", "tokio"

Feature Matrix

FeaturezvecTerraphim
Dense EmbeddingsNativeNot used
Sparse VectorsBM25 supportedBM25/BM25F/BM25Plus
Knowledge GraphNoCore architecture
ANN SearchHNSW/IVF/FlatNot applicable
SQL-like FiltersSQL engineGraph-based filtering
ExplainabilityLow (black box)High (show path)
Synonym ExpansionVia embedding modelVia thesaurus
Role/Persona SupportNoRoleGraphs
Multi-HaystackSingle collectionMultiple sources
Built-in RerankersRRF, WeightedGraph ranks directly
QuantisationINT8/FP16Not needed
Hybrid SearchVectors + FiltersGraph + Haystacks

Performance Characteristics

zvec (Benchmarks from 10M vector dataset)

  • Throughput: 2,000-8,000 QPS depending on configuration
  • Recall: 96-97% with HNSW
  • Latency: Milliseconds for 10M vectors
  • Memory: Compressed vectors (INT8/FP16)
  • Scale: Billions of vectors

Terraphim (Observed Performance)

  • Throughput: In-memory graph traversal (very fast)
  • Recall: Deterministic graph-based ranking
  • Latency: Sub-millisecond for typical graphs
  • Memory: Entire graph in memory
  • Scale: Thousands to tens of thousands of documents

When to Use Which

Choose zvec When:

  1. You need to search billions of documents

    • ANN algorithms scale to massive datasets
    • Production workloads at Alibaba scale
  2. You are building RAG systems with LLMs

    • Dense embeddings align with LLM representations
    • Built-in OpenAI/SentenceTransformer support
  3. You need image/audio similarity search

    • Requires dense embeddings
    • CLIP-style multimodal search
  4. Exact semantic similarity matters

    • "King - Man + Woman = Queen" works
    • Captures semantic relationships beyond keywords

Choose Terraphim When:

  1. You need explainable results

    • "Why did this document rank high?"
    • Graph path shows: matched node X via edge Y to document Z
  2. You have domain-specific knowledge

    • Custom thesauri for technical terms
    • Synonym relationships: "async" = "asynchronous" = "non-blocking"
  3. You are building personal knowledge management

    • Note-taking apps, research assistants
    • Domain expert systems
  4. You need role-based search

    • Different personas see different results
    • Engineer vs. Scientist vs. Writer views

Code Comparison

Document Indexing

zvec (Python):

import zvec

schema = zvec.CollectionSchema(
    name="docs",
    vectors=zvec.VectorSchema("emb", zvec.DataType.VECTOR_FP32, 768),
)

collection = zvec.create_and_open(path="./data", schema=schema)

# Documents must have pre-computed embeddings
collection.insert([
    zvec.Doc(
        id="doc1",
        vectors={"emb": embedding_model.encode("Rust async programming")},
        fields={"title": "Async in Rust"}
    ),
])

Terraphim (Rust):

use terraphim_rolegraph::RoleGraph;
use terraphim_types::{Document, RoleName};

let mut graph = RoleGraph::new(
    RoleName::new("engineer"),
    thesaurus
).await?;

// Documents are indexed into the graph
graph.index_documents(vec![
    Document {
        id: "doc1".into(),
        title: "Async in Rust".into(),
        body: "Rust's async/await syntax...".into(),
        // Graph extracts terms automatically
        ..Default::default()
    },
]).await?;

Searching

zvec:

# Vector similarity search
query_vec = embedding_model.encode("how to write async code")
results = collection.query(
    zvec.VectorQuery("emb", vector=query_vec),
    topk=5
)
# Results ranked by cosine similarity

Terraphim:

// Graph-based search
let results = graph.query_graph("async code", None, Some(5))?;
// Results ranked by:
// 1. Node rank (concept frequency)
// 2. Edge rank (relationship strength)
// 3. Document rank (occurrence count)

Can They Work Together?

Absolutely. Here are some integration patterns:

1. Hybrid Retrieval

Use zvec for initial broad retrieval, Terraphim for reranking:

# Step 1: zvec ANN for candidate retrieval
candidates = zvec_collection.query(query_vector, topk=100)

# Step 2: Terraphim graph reranking
# Load candidates into temporary graph
# Re-rank based on knowledge graph connectivity

Use Terraphim's graph to explain zvec results:

User: "Why did this document match?"
System:
  - zvec: "Vector similarity: 0.92"
  - Terraphim: "Matched via concepts: async -> tokio -> concurrency"

Conclusion

zvec and Terraphim solve semantic search with fundamentally different approaches:

  • zvec scales neural embeddings to billions of documents using ANN algorithms. It is the right choice for large-scale RAG systems, e-commerce search, and any application requiring dense vector similarity.

  • Terraphim builds interpretable knowledge graphs from term relationships. It excels at personal knowledge management, domain-specific expert systems, and any application where understanding why a document matched is as important as finding it.

The exciting possibility is combining both: zvec's scale with Terraphim's explainability. The future of semantic search might just be hybrid.

References


Have you used zvec or Terraphim? We would love to hear about your experiences on GitHub Issues.