Embedding Providers

Overview

Embeddings convert text into dense numerical vectors that capture semantic meaning. In Nadoo AI, embeddings are the foundation of the knowledge base and RAG (Retrieval-Augmented Generation) pipeline. When you upload documents to a knowledge base, the system chunks them and generates embedding vectors for each chunk. At query time, the user’s question is embedded and compared against the stored vectors to find the most relevant content.

Document → Chunking → Embedding Model → Vector Store (pgvector)
                                              ↑
User Query → Embedding Model → Similarity Search → Top-K Results → LLM Context

Supported Embedding Providers

Nadoo AI supports 9+ embedding providers, giving you flexibility to choose based on quality, cost, privacy, and infrastructure requirements.

Provider	Models	Dimensions	Hosting
OpenAI	text-embedding-3-small, text-embedding-3-large	1536 / 3072	Cloud API
Azure OpenAI	text-embedding-ada-002, text-embedding-3-*	1536 / 3072	Cloud (Azure)
AWS Bedrock	amazon.titan-embed-text-v2	1024	Cloud (AWS)
Google AI Studio	text-embedding-004	768	Cloud API
Google Vertex AI	text-embedding-004, text-multilingual-embedding-002	768	Cloud (GCP)
HuggingFace	sentence-transformers/*, BGE, E5	Varies	Cloud API
Ollama	nomic-embed-text, mxbai-embed-large, all-minilm	384 — 1024	Local
vLLM	Any HuggingFace embedding model	Varies	Self-hosted
Local (Built-in)	all-MiniLM-L6-v2	384	Local (CPU)

How Embeddings Work in RAG

The embedding model plays two critical roles in the RAG pipeline:

1. Indexing (Write Path)

When documents are uploaded to a knowledge base:

Documents are split into chunks (configurable chunk size and overlap)
Each chunk is passed through the embedding model
The resulting vectors are stored in PostgreSQL with pgvector
BM25 keyword indexes are built in parallel for hybrid search

2. Retrieval (Read Path)

When a user sends a query:

The query text is embedded using the same embedding model
A similarity search finds the closest vectors in the knowledge base
Optionally, BM25 keyword search runs in parallel (hybrid mode)
Results are combined, deduplicated, and optionally reranked
Top-K chunks are injected into the LLM’s context alongside the user’s message

The embedding model used for indexing must match the model used for retrieval. Changing the embedding model after indexing requires re-embedding all documents in the knowledge base.

Configuring Embeddings

Embedding models are configured at the knowledge base level. Each knowledge base can use a different embedding provider and model.

Create or Edit a Knowledge Base

Navigate to Knowledge Base in your workspace and create a new one or edit an existing one.

Select Embedding Provider

In the Embedding section, choose the provider:

Field	Description
Provider	The embedding service (OpenAI, Ollama, HuggingFace, etc.)
Model	The specific embedding model from that provider

Verify Provider Credentials

Ensure the selected provider is configured in Admin > Model Providers with valid credentials.

Save and Index

Save the knowledge base configuration. When you upload documents, they will be embedded using the selected model.

Provider Details

OpenAI Embeddings

The most popular choice for cloud-based RAG.

Model	Dimensions	Max Tokens	Cost (per 1M tokens)
`text-embedding-3-small`	1536	8,191	Low
`text-embedding-3-large`	3072	8,191	Medium

Best for: General-purpose RAG with the best quality-to-cost ratio.

Azure OpenAI Embeddings

Same models as OpenAI, deployed in your Azure region for data residency.

Model	Dimensions	Max Tokens
`text-embedding-ada-002`	1536	8,191
`text-embedding-3-small`	1536	8,191
`text-embedding-3-large`	3072	8,191

Best for: Enterprise environments requiring Azure compliance and regional data control.

AWS Bedrock Embeddings

Model	Dimensions	Max Tokens
`amazon.titan-embed-text-v2:0`	1024	8,192

Best for: AWS-centric infrastructure with native IAM and VPC integration.

Google Embeddings

Available through both AI Studio and Vertex AI.

Model	Dimensions	Max Tokens
`text-embedding-004`	768	2,048
`text-multilingual-embedding-002`	768	2,048

Best for: Multilingual applications and Google Cloud environments.

HuggingFace Embeddings

Access open-source embedding models through the HuggingFace Inference API.

Model	Dimensions	Best For
`BAAI/bge-large-en-v1.5`	1024	High-quality English embeddings
`intfloat/e5-large-v2`	1024	Instruction-tuned embeddings
`sentence-transformers/all-MiniLM-L6-v2`	384	Fast, lightweight embeddings

Best for: Access to cutting-edge open-source embedding models without self-hosting.

Ollama Embeddings (Local)

Run embedding models entirely on your hardware.

Model	Dimensions	Best For
`nomic-embed-text`	768	Best quality local embeddings
`mxbai-embed-large`	1024	High-dimensional local embeddings
`all-minilm`	384	Fast, lightweight local embeddings
`snowflake-arctic-embed`	1024	Strong retrieval performance

Best for: Privacy-sensitive environments and offline deployments.

vLLM Embeddings (Self-Hosted)

Serve any HuggingFace embedding model at scale with vLLM. Best for: High-volume production embedding with custom or fine-tuned models.

Local Built-in Embeddings

Nadoo AI includes a built-in lightweight embedding model that runs on CPU without external dependencies.

Model	Dimensions
`all-MiniLM-L6-v2`	384

Best for: Quick setup, development, and testing without configuring an external provider.

Choosing the Right Embedding Model

The choice of embedding model affects retrieval quality, speed, and cost. Here is a decision framework:

Priority	Recommended Provider	Model	Why
Best quality	OpenAI	`text-embedding-3-large`	Highest dimensional embeddings with strong semantic capture
Best value	OpenAI	`text-embedding-3-small`	Strong quality at low cost
Enterprise compliance	Azure OpenAI	`text-embedding-3-small`	Same quality with Azure data governance
Complete privacy	Ollama	`nomic-embed-text`	No data leaves your network
Multilingual	Google	`text-multilingual-embedding-002`	Built for cross-language retrieval
Lowest latency	Local built-in	`all-MiniLM-L6-v2`	No network call, runs on CPU
High volume	vLLM	Any HuggingFace model	GPU-accelerated batch embedding
AWS ecosystem	AWS Bedrock	Titan Embed v2	Native AWS integration

Dimensions and Quality

Embedding dimensions affect both quality and storage:

Dimensions	Storage per Vector	Quality	Speed
384	~1.5 KB	Good	Fastest
768	~3 KB	Better	Fast
1024	~4 KB	Very Good	Moderate
1536	~6 KB	Excellent	Moderate
3072	~12 KB	Best	Slower

For most use cases, 1536 dimensions (OpenAI text-embedding-3-small) provides the best balance of quality, speed, and storage cost. Only use 3072 dimensions when retrieval accuracy is the top priority and storage cost is not a concern.

Migrating Embedding Models

Changing the embedding model for an existing knowledge base requires re-embedding all documents, since vectors from different models are not compatible.

Update the Embedding Configuration

Change the embedding provider and model in the knowledge base settings.

Trigger Re-indexing

Click Re-index All Documents in the knowledge base dashboard. This queues all documents for re-embedding with the new model.

Monitor Progress

Track re-indexing progress in the dashboard. The knowledge base remains queryable with old embeddings until re-indexing completes.

Re-indexing large knowledge bases can take significant time and may incur API costs (for cloud embedding providers). Plan the migration during a maintenance window for production knowledge bases.

Next Steps

Knowledge Base

Build and manage knowledge bases with RAG

OpenAI

Configure OpenAI for the most popular embedding models

Ollama

Run embedding models locally with Ollama

AI Models Overview

See all supported AI model providers

​Overview

​Supported Embedding Providers

​How Embeddings Work in RAG

​1. Indexing (Write Path)

​2. Retrieval (Read Path)

​Configuring Embeddings

​Provider Details

​OpenAI Embeddings

​Azure OpenAI Embeddings

​AWS Bedrock Embeddings

​Google Embeddings

​HuggingFace Embeddings

​Ollama Embeddings (Local)

​vLLM Embeddings (Self-Hosted)

​Local Built-in Embeddings

​Choosing the Right Embedding Model

​Dimensions and Quality

​Migrating Embedding Models

​Next Steps

Knowledge Base

OpenAI

Ollama

AI Models Overview

Overview

Supported Embedding Providers

How Embeddings Work in RAG

1. Indexing (Write Path)

2. Retrieval (Read Path)

Configuring Embeddings

Provider Details

OpenAI Embeddings

Azure OpenAI Embeddings

AWS Bedrock Embeddings

Google Embeddings

HuggingFace Embeddings

Ollama Embeddings (Local)

vLLM Embeddings (Self-Hosted)

Local Built-in Embeddings

Choosing the Right Embedding Model

Dimensions and Quality

Migrating Embedding Models

Next Steps