Skip to main content

Overview

Vector search is the core retrieval mechanism in the Nadoo AI Knowledge Base. It finds documents by semantic similarity — matching the meaning of a query against the meaning of stored chunks, even when the exact words differ. This is what enables your AI agents to answer natural language questions using your own documents.

How Vector Similarity Search Works

Vector search operates in three steps: embed the query, search the index, and rank results by distance.
1

Embed the query

The user’s query is converted into a vector (a list of floating-point numbers) using the same embedding model that was used to index the documents. This ensures the query lives in the same vector space as the stored chunks.
2

Search the vector index

The query vector is compared against all indexed chunk vectors using an approximate nearest neighbor (ANN) algorithm. The vector store returns the closest matches based on the configured distance metric.
3

Rank and filter

Results are ranked by similarity score. Chunks below the score_threshold are discarded. The top top_k results are returned with their content, metadata, and scores.

Embedding Providers

Nadoo AI supports multiple embedding providers through a pluggable architecture. The embedding model is configured at the knowledge base level and must remain consistent between indexing and querying.
ProviderExample ModelsDimensionsNotes
OpenAItext-embedding-3-small, text-embedding-3-large1536 / 3072Most widely used. Excellent general-purpose performance.
HuggingFacesentence-transformers/all-MiniLM-L6-v2384Open-source models. Run locally or via HuggingFace Inference API.
LocalCustom ONNX or PyTorch modelsVariesSelf-hosted models for air-gapped or privacy-sensitive deployments.
Azure OpenAItext-embedding-ada-002 (Azure deployment)1536Enterprise OpenAI through Azure with data residency controls.
AWS Bedrockamazon.titan-embed-text-v11536AWS-managed embeddings with Bedrock integration.
Googletext-embedding-004768Google AI Studio and Vertex AI embedding models.
vLLMAny embedding model served by vLLMVariesHigh-throughput embedding via vLLM inference server.
Ollamanomic-embed-text, mxbai-embed-large768 / 1024Local embedding models running through Ollama.

Configuration Example

{
  "embedding": {
    "provider": "openai",
    "model": "text-embedding-3-small",
    "dimensions": 1536,
    "batch_size": 100
  }
}
Changing the embedding model after documents have been indexed requires reprocessing all documents. Vectors from different models are not compatible — they exist in different vector spaces.

Vector Store

Nadoo AI uses a pluggable vector store architecture through the VectorStoreFactory. This abstraction allows you to swap the underlying storage engine without changing your application code.

pgvector (Default)

The default vector store uses pgvector, a PostgreSQL extension for vector similarity search. It runs inside the same PostgreSQL instance used by the rest of the platform, simplifying deployment and operations. Advantages:
  • No additional infrastructure required
  • Transactional consistency with other application data
  • Supports metadata filtering alongside vector search
  • Production-proven at moderate scale (millions of vectors)

Planned Providers

ProviderStatusUse Case
MilvusPlannedHigh-scale deployments with billions of vectors
QdrantPlannedSpecialized vector database with advanced filtering
The VectorStoreFactory selects the vector store implementation based on your configuration. To use pgvector (the default), no additional setup is needed beyond the standard PostgreSQL deployment.

Distance Metrics

The distance metric determines how similarity between vectors is calculated. Different metrics are suited to different embedding models.
MetricDescriptionWhen to Use
CosineMeasures the angle between two vectors, normalized for magnitude. Score range: 0 to 1 (higher is more similar).Default and recommended. Works with most embedding models. Insensitive to vector length.
EuclideanMeasures the straight-line distance between two vectors. Lower values indicate higher similarity.Use when vector magnitude carries meaningful information.
Dot ProductComputes the inner product of two vectors. Higher values indicate higher similarity.Use with models that produce normalized vectors where dot product equals cosine similarity.

Configuration

{
  "vector_store": {
    "provider": "pgvector",
    "distance_metric": "cosine"
  }
}

Index Types

Vector indexes accelerate search by avoiding brute-force comparison against every stored vector. Nadoo AI supports two index types through pgvector.

HNSW (Hierarchical Navigable Small World)

HNSW builds a multi-layer graph of vectors that allows fast approximate nearest neighbor search.
  • Build time: Slower to build than IVFFlat
  • Query speed: Faster queries, especially for high-dimensional vectors
  • Accuracy: Higher recall at the same speed compared to IVFFlat
  • Memory: Uses more memory than IVFFlat
Best for: Production workloads where query speed and accuracy are priorities.
-- Created automatically by Nadoo AI
CREATE INDEX ON chunks USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

IVFFlat (Inverted File with Flat Compression)

IVFFlat partitions vectors into clusters and searches only the most relevant clusters at query time.
  • Build time: Faster to build than HNSW
  • Query speed: Good, but depends on the number of probes
  • Accuracy: Depends on the nprobe setting — more probes means higher accuracy but slower queries
  • Memory: Uses less memory than HNSW
Best for: Large datasets where memory efficiency matters, or when you need fast index rebuilds.
-- Created automatically by Nadoo AI
CREATE INDEX ON chunks USING ivfflat (embedding vector_cosine_ops)
  WITH (lists = 100);
For most deployments, HNSW is the recommended index type. It provides better query performance and higher recall without requiring parameter tuning.

Search Configuration

Configure vector search behavior through these parameters.
ParameterDefaultDescription
top_k5Number of nearest chunks to return
score_threshold0.5Minimum similarity score (0.0 to 1.0). Chunks below this score are excluded.
distance_metriccosineDistance function: cosine, euclidean, or dot_product

Example Query

{
  "query": "How does the authentication system handle token refresh?",
  "search_mode": "vector",
  "top_k": 5,
  "score_threshold": 0.7
}

Response

{
  "results": [
    {
      "chunk_id": "chk_001",
      "content": "The authentication system uses short-lived access tokens (15 minutes) paired with longer-lived refresh tokens (7 days). When an access token expires...",
      "score": 0.92,
      "metadata": {
        "document_id": "doc_abc123",
        "filename": "auth-architecture.md",
        "heading": "Token Lifecycle",
        "page_number": null
      }
    },
    {
      "chunk_id": "chk_002",
      "content": "Refresh tokens are rotated on each use. The server issues a new refresh token and invalidates the previous one...",
      "score": 0.87,
      "metadata": {
        "document_id": "doc_abc123",
        "filename": "auth-architecture.md",
        "heading": "Refresh Token Rotation",
        "page_number": null
      }
    }
  ],
  "total": 2,
  "search_mode": "vector"
}

Performance Tuning

  • General purpose: text-embedding-3-small (OpenAI) offers a strong balance of quality and cost.
  • Higher accuracy: text-embedding-3-large (OpenAI) or fine-tuned HuggingFace models for domain-specific content.
  • Cost-sensitive / air-gapped: Local models via Ollama or ONNX reduce dependency on external APIs.
  • Multilingual: Models like multilingual-e5-large perform well across languages.
  • Start with top_k: 5 and score_threshold: 0.5 as a baseline.
  • If responses include irrelevant content, raise score_threshold to 0.7 or higher.
  • If responses miss relevant content, lower score_threshold or increase top_k.
  • For reranking pipelines, set a higher top_k (e.g., 20) for the initial retrieval and let the reranker narrow it down.
  • Use HNSW for most workloads — it provides the best query performance.
  • Use IVFFlat if you have memory constraints or need frequent index rebuilds (e.g., during rapid document ingestion).
  • For datasets under 100,000 vectors, index type has minimal impact on query speed.

Next Steps