Knowledge
Retrieval & Prompt Injection
Tune dense retrieval parameters and understand how knowledge context reaches your LLM prompts.
Retrieval Pipeline
- Embed Query – The selected embedder projects the caller's query or task input into the same vector space as stored chunks.
- Similarity Search – The vector database runs a dense similarity query with optional tag filters and returns scored matches.
- Post Processing – Matches below
min_score
are discarded. Remaining passages are truncated to honor the binding'smax_tokens
. - Prompt Injection – The orchestrator appends the formatted passages to the LLM prompt before tools, memory, and attachments.
Key Parameters
Field | Scope | Default Source | Purpose |
---|---|---|---|
top_k | Knowledge base/binding | retrieval.top_k or config.knowledge.retrieval_top_k | Number of passages returned |
min_score | Knowledge base/binding | retrieval.min_score or config.knowledge.retrieval_min_score | Similarity threshold (−1.0–1.0); use ≥0 when embeddings are L2-normalized |
max_tokens | Binding | Inline override | Upper bound on tokens injected into the prompt |
filters | Knowledge base/binding | Inline map | Exact-match metadata filters (e.g., { tag: "policy" } ) |
reranker | Planned | N/A | Currently unavailable in the MVP; leave unset until the feature ships |
Set conservative defaults at the knowledge base level, then tighten or loosen them per workflow or task.
Formatting Retrieved Passages
Compozy injects matches in descending score order using the following template:
<!-- knowledge:quickstart_docs -->
### {title or source}
{chunk_text}
[source]
metadata surfaces when available; otherwise, the source path or URL is displayed.- Passages are separated by blank lines to keep prompts readable.
- Token limits are enforced before the prompt reaches the LLM to prevent truncation in provider SDKs.
Coordinating With Other Context
1
System instructions
Base prompt authored in agent or workflow YAML.
2
Knowledge context
Injected here. Retrieval happens before tool results and memory so the LLM sees authoritative guidance first.
3
Tool outputs
Any tool executions triggered during the same turn append after knowledge context.
4
Memory
Long-term memory results append last to avoid polluting retrieval scoring with conversational noise.
Guardrails & Safety
- Configure
min_score
to avoid low-signal passages drowning out relevant content. - Use tag filters to restrict context to the correct audience (e.g.,
{ audience: "support" }
). - Keep
max_tokens
under your provider's prompt budget. Monitor theprompt_tokens
metric from your LLM provider dashboards. - If retrieval returns no matches, the orchestrator logs a warning and skips prompt injection rather than returning stale context.
Testing Retrieval Locally
compozy knowledge query --id quickstart_docs \
--text "What is our support SLA?" \
--top_k 3 \
--min_score 0.25 \
--output json
Inspect responses to ensure metadata and ETags align with expectations before binding knowledge to production workflows.