Knowledge

Knowledge Observability

Monitor ingestion, chunking, and retrieval with structured logs, metrics, and traces.

Telemetry Fundamentals

All ingestion and retrieval code paths must derive telemetry from the active request context:

  • logger.FromContext(ctx) emits structured JSON logs with correlation IDs.
  • config.FromContext(ctx) exposes feature toggles (e.g., batching limits) so you can surface them in diagnostics.
  • OpenTelemetry spans wrap embedder calls and vector DB operations for end-to-end tracing.

Metrics

Metric NameTypeLabelsDescription
knowledge_ingest_duration_secondsHistogramkb_id, statusTotal ingestion duration including download, chunking, and persist
knowledge_chunks_totalCounterkb_id, source_kindNumber of chunks written per ingestion attempt
knowledge_ingest_failures_totalCounterkb_id, reasonFailure counts grouped by error category
knowledge_query_latency_secondsHistogramkb_id, result_countRetrieval latency from query embed to vector DB response

Set alert thresholds in Grafana or your preferred monitoring stack. Example: trigger an alert if knowledge_ingest_failures_total increases by more than 5 within 10 minutes.

Structured Logs

Key log events emitted during ingestion:

Ingestion Start
{
  "severity": "INFO",
  "message": "knowledge.ingest.start",
  "kb_id": "quickstart_docs",
  "sources": 3,
  "batch_size": 32
}
Ingestion Failure
{
  "severity": "ERROR",
  "message": "knowledge.ingest.failure",
  "kb_id": "quickstart_docs",
  "source": "docs/policies.md",
  "error": "provider_throttled",
  "retry_in": "2s"
}

During retrieval, look for knowledge.query.success and knowledge.query.miss to understand scoring behavior and match counts.

Tracing

  • Wrap embedder calls in spans with attributes for provider, model, and batch_size.
  • Tag vector DB spans with db.system=postgres or db.system=qdrant.
  • Use span links to tie ingestion retries back to the original request.

Dashboards & Alerts

  1. Ingestion reliability – Chart success rate using knowledge_ingest_failures_total and knowledge_ingest_duration_seconds.
  2. Query latency – Track p50/p95 from knowledge_query_latency_seconds. Investigate spikes for provider rate limiting.
  3. Chunk growth – Monitor knowledge_chunks_total to ensure vector DB storage scales gradually and to spot runaway ingestion loops.

CLI & API Diagnostics

# Stream logs for ingestion jobs
compozy knowledge ingest --id quickstart_docs --verbose

# Fetch metrics locally (requires Prometheus endpoint)
curl http://localhost:5001/metrics | grep knowledge_

Incident Response Checklist

  • Verify provider credentials and rate limits.
  • Check vector DB health (indexes, disk utilization, pool size).
  • Re-run ingestion with smaller batches if providers throttle requests.
  • Use tracing to pinpoint slow spans (e.g., remote PDF download vs. embedding latency).

Additional Reading