Knowledge
Knowledge Observability
Monitor ingestion, chunking, and retrieval with structured logs, metrics, and traces.
Telemetry Fundamentals
All ingestion and retrieval code paths must derive telemetry from the active request context:
logger.FromContext(ctx)emits structured JSON logs with correlation IDs.config.FromContext(ctx)exposes feature toggles (e.g., batching limits) so you can surface them in diagnostics.- OpenTelemetry spans wrap embedder calls and vector DB operations for end-to-end tracing.
Metrics
| Metric Name | Type | Labels | Description |
|---|---|---|---|
knowledge_ingest_duration_seconds | Histogram | kb_id, status | Total ingestion duration including download, chunking, and persist |
knowledge_chunks_total | Counter | kb_id, source_kind | Number of chunks written per ingestion attempt |
knowledge_ingest_failures_total | Counter | kb_id, reason | Failure counts grouped by error category |
knowledge_query_latency_seconds | Histogram | kb_id, result_count | Retrieval latency from query embed to vector DB response |
Set alert thresholds in Grafana or your preferred monitoring stack. Example: trigger an alert if knowledge_ingest_failures_total increases by more than 5 within 10 minutes.
Structured Logs
Key log events emitted during ingestion:
{
"severity": "INFO",
"message": "knowledge.ingest.start",
"kb_id": "quickstart_docs",
"sources": 3,
"batch_size": 32
}{
"severity": "ERROR",
"message": "knowledge.ingest.failure",
"kb_id": "quickstart_docs",
"source": "docs/policies.md",
"error": "provider_throttled",
"retry_in": "2s"
}During retrieval, look for knowledge.query.success and knowledge.query.miss to understand scoring behavior and match counts.
Tracing
- Wrap embedder calls in spans with attributes for
provider,model, andbatch_size. - Tag vector DB spans with
db.system=postgresordb.system=qdrant. - Use span links to tie ingestion retries back to the original request.
Dashboards & Alerts
- Ingestion reliability – Chart success rate using
knowledge_ingest_failures_totalandknowledge_ingest_duration_seconds. - Query latency – Track p50/p95 from
knowledge_query_latency_seconds. Investigate spikes for provider rate limiting. - Chunk growth – Monitor
knowledge_chunks_totalto ensure vector DB storage scales gradually and to spot runaway ingestion loops.
CLI & API Diagnostics
# Stream logs for ingestion jobs
compozy knowledge ingest --id quickstart_docs --verbose
# Fetch metrics locally (requires Prometheus endpoint)
curl http://localhost:5001/metrics | grep knowledge_Incident Response Checklist
- Verify provider credentials and rate limits.
- Check vector DB health (indexes, disk utilization, pool size).
- Re-run ingestion with smaller batches if providers throttle requests.
- Use tracing to pinpoint slow spans (e.g., remote PDF download vs. embedding latency).