Skip to main content

Overview

Nadoo AI provides built-in monitoring, health checks, and metrics collection to ensure platform reliability. The observability stack includes Prometheus-compatible metrics, structured logging, health probes for Kubernetes, and real-time system resource tracking.

Health Checks

Liveness, readiness, and comprehensive health endpoints for orchestration platforms.

Prometheus Metrics

HTTP request counters, latency histograms, connection pool gauges, and business metrics.

Structured Logging

Colored, leveled logging with configurable output and SQL query tracing.

Health Check Endpoints

Nadoo AI exposes three levels of health checks, designed for use with Kubernetes probes, load balancers, or external monitoring tools.
Returns immediately to confirm the process is running. Use this for Kubernetes livenessProbe.
GET /health/liveness
Response (200 OK):
{
  "status": "alive",
  "timestamp": "2026-03-09T12:00:00Z"
}

Health Status Values

StatusMeaning
healthyAll components are functioning normally
degradedThe system is operational but one or more resources exceed warning thresholds (CPU, memory, or disk > 90%)
unhealthyOne or more critical components (database, Redis) are unavailable
The comprehensive health check triggers real-time system metric collection. Avoid polling it more frequently than every 15 seconds to minimize overhead.

Prometheus Metrics

Nadoo AI exposes a /metrics endpoint in Prometheus exposition format. The platform collects metrics automatically via a background task that runs every 30 seconds.

Available Metrics

MetricTypeLabelsDescription
http_requests_totalCountermethod, endpoint, statusTotal HTTP requests received
http_request_duration_secondsHistogrammethod, endpointRequest latency distribution
active_connectionsGaugeNumber of currently active connections

Scraping Metrics

Configure Prometheus to scrape the /metrics endpoint:
# prometheus.yml
scrape_configs:
  - job_name: 'nadoo-ai'
    scrape_interval: 30s
    static_configs:
      - targets: ['nadoo-backend:8000']
    metrics_path: '/metrics'
The /metrics endpoint is excluded from the metrics middleware itself to prevent recursive metric collection.

Logging Configuration

Nadoo AI uses Python’s standard logging library with colored output and configurable log levels.

Log Levels

LevelUse Case
DEBUGVerbose output including system metric snapshots and SQL queries (when LOG_SQL_QUERIES=true)
INFOStandard operational messages (default)
WARNINGNon-critical issues such as rate limit warnings or deprecated API usage
ERRORFailures that affect individual requests
CRITICALSystem-level failures requiring immediate attention

Configuration

Control logging via environment variables:
# Set the global log level
NADOO_LOG_LEVEL=INFO

# Write logs to a file (in addition to stdout)
NADOO_LOG_FILE=/var/log/nadoo/app.log

# Enable verbose SQL query logging
NADOO_LOG_SQL_QUERIES=false

Log Format

All log entries follow a structured format with timestamps, module names, and colored severity levels:
2026-03-09 12:00:00,123 - src.api.v1.chat_router - INFO - Chat message processed
2026-03-09 12:00:00,456 - src.core.metrics - DEBUG - System metrics collected - CPU: 23.5%, Memory: 61.2%, Disk: 45.0%
2026-03-09 12:00:01,789 - src.core.access_control - WARNING - Rate limit exceeded for user:abc123
Noisy loggers (uvicorn access logs, watchfiles, SQLAlchemy internals) are automatically suppressed to keep output clean. Enable LOG_SQL_QUERIES=true only when debugging database performance issues.

PostHog Analytics

Nadoo AI optionally integrates with PostHog for product analytics and error tracking.
VariableDefaultDescription
NADOO_POSTHOG_ENABLEDnullExplicitly enable or disable PostHog. If unset, auto-enabled in demo mode.
NADOO_POSTHOG_API_KEYYour PostHog project API key
NADOO_POSTHOG_HOSThttps://us.i.posthog.comPostHog ingestion host
PostHog is disabled by default for self-hosted (air-gapped) deployments. It is only auto-enabled when DEMO_MODE=true. Set POSTHOG_ENABLED=false explicitly to ensure no external analytics calls are made.

System Statistics

The system management API provides aggregate platform statistics:
GET /api/v1/system/statistics
Authorization: Bearer {admin-access-token}
This endpoint returns counts and trends for users, workspaces, applications, chat messages, and document processing across the platform.

Audit Logging

Nadoo AI records all significant actions in an audit log, including:
  • Resource creation, updates, and deletions
  • Access control denials (domain, IP, rate limit)
  • Role changes and member management
  • API key lifecycle events
Query audit logs via the system API:
GET /api/v1/system/audit-logs?page=1&size=50
Authorization: Bearer {admin-access-token}
Each audit entry captures the acting user, action type, resource details, IP address, user agent, and timestamp.

Kubernetes Integration

Use the following probe configuration in your Kubernetes deployment:
# deployment.yaml
spec:
  containers:
    - name: nadoo-backend
      livenessProbe:
        httpGet:
          path: /health/liveness
          port: 8000
        initialDelaySeconds: 10
        periodSeconds: 15
        failureThreshold: 3
      readinessProbe:
        httpGet:
          path: /health/readiness
          port: 8000
        initialDelaySeconds: 20
        periodSeconds: 10
        failureThreshold: 5
Set initialDelaySeconds high enough for your environment. The readiness probe will return not_ready until the database connection pool is established, Redis is reachable, and all startup tasks have completed.

Monitoring Best Practices

Configure alerts in your monitoring system (Grafana, PagerDuty, etc.) to trigger when the comprehensive health check returns degraded or unhealthy for more than 2 consecutive checks.
Watch the database_pool_checked_out metric against database_pool_size. Pool exhaustion leads to request timeouts. Default pool settings: size=5, max overflow=10.
A healthy cache hit ratio should be above 70%. If cache_misses_total grows faster than cache_hits_total, review your caching strategy or increase Redis memory.
Schedule weekly reviews of audit logs to catch unusual access patterns, repeated access denials, or unexpected administrative actions.