Skip to main content

Overview

Ollama is a local model runtime that lets you run open-source AI models on your own hardware. With Ollama, your data never leaves your network — there are no API costs, no rate limits, and no external dependencies. This makes it the ideal provider for privacy-sensitive environments, air-gapped deployments, and development workflows where you want fast, free model access. Key benefits:
  • Complete privacy — No data sent to external APIs; all inference runs locally
  • Zero cost — No per-token charges; pay only for your hardware
  • Offline capable — Works without internet once models are downloaded
  • Wide model selection — Run Llama 3, Mistral, CodeLlama, Phi, Gemma, and many more
  • Embedding support — Generate embeddings locally for knowledge base indexing

Setup

1

Install Ollama

Download and install Ollama from ollama.com.macOS:
brew install ollama
Linux:
curl -fsSL https://ollama.com/install.sh | sh
Windows: Download the installer from ollama.com/download.
2

Start Ollama

Start the Ollama service:
ollama serve
By default, Ollama listens on http://localhost:11434.
3

Pull Models

Download the models you want to use:
# Pull a chat model
ollama pull llama3.1

# Pull an embedding model
ollama pull nomic-embed-text

# List downloaded models
ollama list
4

Configure in Nadoo

Go to Admin > Model Providers > Ollama and enter:
FieldRequiredDescription
Base URLYesThe Ollama server URL (default: http://localhost:11434)
5

Test Connection

Click Test to verify the connection. Nadoo will discover all models available on your Ollama instance and list them for selection.

Available Models

Ollama supports hundreds of open-source models. Here are the most commonly used ones:

Chat / LLM

ModelParametersContext WindowBest For
llama3.18B / 70B128K tokensGeneral-purpose chat and reasoning
llama3.21B / 3B128K tokensLightweight, fast responses
mistral7B32K tokensStrong performance for its size
mixtral8x7B32K tokensMixture-of-experts, wide knowledge
codellama7B / 13B / 34B16K tokensCode generation and understanding
phi33.8B / 14B128K tokensCompact yet capable
gemma29B / 27B8K tokensGoogle’s open model family
qwen2.57B / 72B128K tokensStrong multilingual and coding
deepseek-r17B / 70B64K tokensReasoning and math

Embedding

ModelDimensionsBest For
nomic-embed-text768General-purpose embeddings
mxbai-embed-large1024High-quality embeddings
all-minilm384Fast, lightweight embeddings
snowflake-arctic-embed1024Strong retrieval performance
Browse all available models at ollama.com/library. Pull any model with ollama pull <model-name>.

Capabilities

Chat Completion

Conversational AI with streaming support for all chat-capable models.

Embeddings

Local embedding generation for knowledge base indexing and semantic search.

Privacy

All data stays on your hardware. No network calls to external services.

No Rate Limits

Run as many requests as your hardware can handle — no quotas or throttling.

Hardware Requirements

Model performance depends on your hardware. Here are general guidelines:
Model SizeMinimum RAMRecommended GPUInference Speed
1B — 3B4 GBNot required (CPU)Fast
7B — 8B8 GB8 GB VRAMModerate
13B — 14B16 GB16 GB VRAMModerate
34B32 GB24 GB VRAMSlower
70B64 GB48 GB VRAM (or 2x 24 GB)Slow
For development and testing, Llama 3.1 8B or Mistral 7B on a machine with 16 GB RAM provides a good balance of quality and speed. For production use with demanding workloads, consider the 70B models on GPU-equipped servers.

Connecting Remote Ollama

If Ollama runs on a different machine (e.g., a GPU server), set the base URL to that machine’s address:
Base URL: http://gpu-server.internal:11434
Ensure the Ollama server is configured to accept external connections by setting:
OLLAMA_HOST=0.0.0.0:11434
When exposing Ollama over a network, ensure it is only accessible within your trusted network or behind a reverse proxy with authentication. Ollama does not have built-in authentication.

Environment Variable

When self-hosting Nadoo AI, configure Ollama via environment variable:
OLLAMA_BASE_URL=http://localhost:11434
If both the environment variable and the admin UI configuration are set, the admin UI value takes precedence.
Use CaseRecommended ModelReason
General chatbotllama3.1:8bBest all-around open-source model
Complex reasoningllama3.1:70bHighest quality open-source
Code assistantcodellama:34bPurpose-built for code tasks
Fast responsesphi3:3.8bSmall, fast, and capable
Knowledge base searchnomic-embed-textStrong embedding quality
Multilingualqwen2.5:7bExcellent multilingual support
Development / testingllama3.2:3bFast iteration, low resource usage

Troubleshooting

Ollama is not running or is listening on a different port. Start it with ollama serve and verify the base URL matches the configured address.
The model has not been pulled yet. Run ollama pull <model-name> to download it. Check available models with ollama list.
The model may be too large for your hardware. Try a smaller model (e.g., 8B instead of 70B) or ensure GPU acceleration is enabled. Check ollama ps to see resource usage.
The model requires more RAM or VRAM than available. Use a smaller model or a quantized version (e.g., llama3.1:8b-q4_0 for 4-bit quantization).