Overview
Nadoo AI provides built-in monitoring, health checks, and metrics collection to ensure platform reliability. The observability stack includes Prometheus-compatible metrics, structured logging, health probes for Kubernetes, and real-time system resource tracking.Health Checks
Liveness, readiness, and comprehensive health endpoints for orchestration platforms.
Prometheus Metrics
HTTP request counters, latency histograms, connection pool gauges, and business metrics.
Structured Logging
Colored, leveled logging with configurable output and SQL query tracing.
Health Check Endpoints
Nadoo AI exposes three levels of health checks, designed for use with Kubernetes probes, load balancers, or external monitoring tools.- Liveness Probe
- Readiness Probe
- Comprehensive Health
Returns immediately to confirm the process is running. Use this for Kubernetes Response (200 OK):
livenessProbe.Health Status Values
| Status | Meaning |
|---|---|
healthy | All components are functioning normally |
degraded | The system is operational but one or more resources exceed warning thresholds (CPU, memory, or disk > 90%) |
unhealthy | One or more critical components (database, Redis) are unavailable |
The comprehensive health check triggers real-time system metric collection. Avoid polling it more frequently than every 15 seconds to minimize overhead.
Prometheus Metrics
Nadoo AI exposes a/metrics endpoint in Prometheus exposition format. The platform collects metrics automatically via a background task that runs every 30 seconds.
Available Metrics
- HTTP Metrics
- Database Metrics
- Cache Metrics
- Business Metrics
- System Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
http_requests_total | Counter | method, endpoint, status | Total HTTP requests received |
http_request_duration_seconds | Histogram | method, endpoint | Request latency distribution |
active_connections | Gauge | — | Number of currently active connections |
Scraping Metrics
Configure Prometheus to scrape the/metrics endpoint:
Logging Configuration
Nadoo AI uses Python’s standard logging library with colored output and configurable log levels.Log Levels
| Level | Use Case |
|---|---|
DEBUG | Verbose output including system metric snapshots and SQL queries (when LOG_SQL_QUERIES=true) |
INFO | Standard operational messages (default) |
WARNING | Non-critical issues such as rate limit warnings or deprecated API usage |
ERROR | Failures that affect individual requests |
CRITICAL | System-level failures requiring immediate attention |
Configuration
Control logging via environment variables:Log Format
All log entries follow a structured format with timestamps, module names, and colored severity levels:Noisy loggers (uvicorn access logs, watchfiles, SQLAlchemy internals) are automatically suppressed to keep output clean. Enable
LOG_SQL_QUERIES=true only when debugging database performance issues.PostHog Analytics
Nadoo AI optionally integrates with PostHog for product analytics and error tracking.| Variable | Default | Description |
|---|---|---|
NADOO_POSTHOG_ENABLED | null | Explicitly enable or disable PostHog. If unset, auto-enabled in demo mode. |
NADOO_POSTHOG_API_KEY | — | Your PostHog project API key |
NADOO_POSTHOG_HOST | https://us.i.posthog.com | PostHog ingestion host |
PostHog is disabled by default for self-hosted (air-gapped) deployments. It is only auto-enabled when
DEMO_MODE=true. Set POSTHOG_ENABLED=false explicitly to ensure no external analytics calls are made.System Statistics
The system management API provides aggregate platform statistics:Audit Logging
Nadoo AI records all significant actions in an audit log, including:- Resource creation, updates, and deletions
- Access control denials (domain, IP, rate limit)
- Role changes and member management
- API key lifecycle events
Kubernetes Integration
Use the following probe configuration in your Kubernetes deployment:Monitoring Best Practices
Set up alerting on health status
Set up alerting on health status
Configure alerts in your monitoring system (Grafana, PagerDuty, etc.) to trigger when the comprehensive health check returns
degraded or unhealthy for more than 2 consecutive checks.Monitor database connection pool saturation
Monitor database connection pool saturation
Watch the
database_pool_checked_out metric against database_pool_size. Pool exhaustion leads to request timeouts. Default pool settings: size=5, max overflow=10.Track cache hit ratios
Track cache hit ratios
A healthy cache hit ratio should be above 70%. If
cache_misses_total grows faster than cache_hits_total, review your caching strategy or increase Redis memory.Review audit logs regularly
Review audit logs regularly
Schedule weekly reviews of audit logs to catch unusual access patterns, repeated access denials, or unexpected administrative actions.