Skip to main content

Overview

Embeddings convert text into dense numerical vectors that capture semantic meaning. In Nadoo AI, embeddings are the foundation of the knowledge base and RAG (Retrieval-Augmented Generation) pipeline. When you upload documents to a knowledge base, the system chunks them and generates embedding vectors for each chunk. At query time, the user’s question is embedded and compared against the stored vectors to find the most relevant content.
Document → Chunking → Embedding Model → Vector Store (pgvector)

User Query → Embedding Model → Similarity Search → Top-K Results → LLM Context

Supported Embedding Providers

Nadoo AI supports 9+ embedding providers, giving you flexibility to choose based on quality, cost, privacy, and infrastructure requirements.
ProviderModelsDimensionsHosting
OpenAItext-embedding-3-small, text-embedding-3-large1536 / 3072Cloud API
Azure OpenAItext-embedding-ada-002, text-embedding-3-*1536 / 3072Cloud (Azure)
AWS Bedrockamazon.titan-embed-text-v21024Cloud (AWS)
Google AI Studiotext-embedding-004768Cloud API
Google Vertex AItext-embedding-004, text-multilingual-embedding-002768Cloud (GCP)
HuggingFacesentence-transformers/*, BGE, E5VariesCloud API
Ollamanomic-embed-text, mxbai-embed-large, all-minilm384 — 1024Local
vLLMAny HuggingFace embedding modelVariesSelf-hosted
Local (Built-in)all-MiniLM-L6-v2384Local (CPU)

How Embeddings Work in RAG

The embedding model plays two critical roles in the RAG pipeline:

1. Indexing (Write Path)

When documents are uploaded to a knowledge base:
  1. Documents are split into chunks (configurable chunk size and overlap)
  2. Each chunk is passed through the embedding model
  3. The resulting vectors are stored in PostgreSQL with pgvector
  4. BM25 keyword indexes are built in parallel for hybrid search

2. Retrieval (Read Path)

When a user sends a query:
  1. The query text is embedded using the same embedding model
  2. A similarity search finds the closest vectors in the knowledge base
  3. Optionally, BM25 keyword search runs in parallel (hybrid mode)
  4. Results are combined, deduplicated, and optionally reranked
  5. Top-K chunks are injected into the LLM’s context alongside the user’s message
The embedding model used for indexing must match the model used for retrieval. Changing the embedding model after indexing requires re-embedding all documents in the knowledge base.

Configuring Embeddings

Embedding models are configured at the knowledge base level. Each knowledge base can use a different embedding provider and model.
1

Create or Edit a Knowledge Base

Navigate to Knowledge Base in your workspace and create a new one or edit an existing one.
2

Select Embedding Provider

In the Embedding section, choose the provider:
FieldDescription
ProviderThe embedding service (OpenAI, Ollama, HuggingFace, etc.)
ModelThe specific embedding model from that provider
3

Verify Provider Credentials

Ensure the selected provider is configured in Admin > Model Providers with valid credentials.
4

Save and Index

Save the knowledge base configuration. When you upload documents, they will be embedded using the selected model.

Provider Details

OpenAI Embeddings

The most popular choice for cloud-based RAG.
ModelDimensionsMax TokensCost (per 1M tokens)
text-embedding-3-small15368,191Low
text-embedding-3-large30728,191Medium
Best for: General-purpose RAG with the best quality-to-cost ratio.

Azure OpenAI Embeddings

Same models as OpenAI, deployed in your Azure region for data residency.
ModelDimensionsMax Tokens
text-embedding-ada-00215368,191
text-embedding-3-small15368,191
text-embedding-3-large30728,191
Best for: Enterprise environments requiring Azure compliance and regional data control.

AWS Bedrock Embeddings

ModelDimensionsMax Tokens
amazon.titan-embed-text-v2:010248,192
Best for: AWS-centric infrastructure with native IAM and VPC integration.

Google Embeddings

Available through both AI Studio and Vertex AI.
ModelDimensionsMax Tokens
text-embedding-0047682,048
text-multilingual-embedding-0027682,048
Best for: Multilingual applications and Google Cloud environments.

HuggingFace Embeddings

Access open-source embedding models through the HuggingFace Inference API.
ModelDimensionsBest For
BAAI/bge-large-en-v1.51024High-quality English embeddings
intfloat/e5-large-v21024Instruction-tuned embeddings
sentence-transformers/all-MiniLM-L6-v2384Fast, lightweight embeddings
Best for: Access to cutting-edge open-source embedding models without self-hosting.

Ollama Embeddings (Local)

Run embedding models entirely on your hardware.
ModelDimensionsBest For
nomic-embed-text768Best quality local embeddings
mxbai-embed-large1024High-dimensional local embeddings
all-minilm384Fast, lightweight local embeddings
snowflake-arctic-embed1024Strong retrieval performance
Best for: Privacy-sensitive environments and offline deployments.

vLLM Embeddings (Self-Hosted)

Serve any HuggingFace embedding model at scale with vLLM. Best for: High-volume production embedding with custom or fine-tuned models.

Local Built-in Embeddings

Nadoo AI includes a built-in lightweight embedding model that runs on CPU without external dependencies.
ModelDimensions
all-MiniLM-L6-v2384
Best for: Quick setup, development, and testing without configuring an external provider.

Choosing the Right Embedding Model

The choice of embedding model affects retrieval quality, speed, and cost. Here is a decision framework:
PriorityRecommended ProviderModelWhy
Best qualityOpenAItext-embedding-3-largeHighest dimensional embeddings with strong semantic capture
Best valueOpenAItext-embedding-3-smallStrong quality at low cost
Enterprise complianceAzure OpenAItext-embedding-3-smallSame quality with Azure data governance
Complete privacyOllamanomic-embed-textNo data leaves your network
MultilingualGoogletext-multilingual-embedding-002Built for cross-language retrieval
Lowest latencyLocal built-inall-MiniLM-L6-v2No network call, runs on CPU
High volumevLLMAny HuggingFace modelGPU-accelerated batch embedding
AWS ecosystemAWS BedrockTitan Embed v2Native AWS integration

Dimensions and Quality

Embedding dimensions affect both quality and storage:
DimensionsStorage per VectorQualitySpeed
384~1.5 KBGoodFastest
768~3 KBBetterFast
1024~4 KBVery GoodModerate
1536~6 KBExcellentModerate
3072~12 KBBestSlower
For most use cases, 1536 dimensions (OpenAI text-embedding-3-small) provides the best balance of quality, speed, and storage cost. Only use 3072 dimensions when retrieval accuracy is the top priority and storage cost is not a concern.

Migrating Embedding Models

Changing the embedding model for an existing knowledge base requires re-embedding all documents, since vectors from different models are not compatible.
1

Update the Embedding Configuration

Change the embedding provider and model in the knowledge base settings.
2

Trigger Re-indexing

Click Re-index All Documents in the knowledge base dashboard. This queues all documents for re-embedding with the new model.
3

Monitor Progress

Track re-indexing progress in the dashboard. The knowledge base remains queryable with old embeddings until re-indexing completes.
Re-indexing large knowledge bases can take significant time and may incur API costs (for cloud embedding providers). Plan the migration during a maintenance window for production knowledge bases.

Next Steps