Knowledge

Retrieval & Prompt Injection

Tune dense retrieval parameters and understand how knowledge context reaches your LLM prompts.

Retrieval Pipeline

  1. Embed Query – The selected embedder projects the caller's query or task input into the same vector space as stored chunks.
  2. Similarity Search – The vector database runs a dense similarity query with optional tag filters and returns scored matches.
  3. Post Processing – Matches below min_score are discarded. Remaining passages are truncated to honor the binding's max_tokens.
  4. Prompt Injection – The orchestrator appends the formatted passages to the LLM prompt before tools, memory, and attachments.

Key Parameters

FieldScopeDefault SourcePurpose
top_kKnowledge base/bindingretrieval.top_k or config.knowledge.retrieval_top_kNumber of passages returned
min_scoreKnowledge base/bindingretrieval.min_score or config.knowledge.retrieval_min_scoreSimilarity threshold (−1.0–1.0); use ≥0 when embeddings are L2-normalized
max_tokensBindingInline overrideUpper bound on tokens injected into the prompt
filtersKnowledge base/bindingInline mapExact-match metadata filters (e.g., { tag: "policy" })
rerankerPlannedN/ACurrently unavailable in the MVP; leave unset until the feature ships

Set conservative defaults at the knowledge base level, then tighten or loosen them per workflow or task.

Formatting Retrieved Passages

Compozy injects matches in descending score order using the following template:

<!-- knowledge:quickstart_docs -->
### {title or source}

{chunk_text}
  • [source] metadata surfaces when available; otherwise, the source path or URL is displayed.
  • Passages are separated by blank lines to keep prompts readable.
  • Token limits are enforced before the prompt reaches the LLM to prevent truncation in provider SDKs.

Coordinating With Other Context

1

System instructions

Base prompt authored in agent or workflow YAML.

2

Knowledge context

Injected here. Retrieval happens before tool results and memory so the LLM sees authoritative guidance first.

3

Tool outputs

Any tool executions triggered during the same turn append after knowledge context.

4

Memory

Long-term memory results append last to avoid polluting retrieval scoring with conversational noise.

Guardrails & Safety

  • Configure min_score to avoid low-signal passages drowning out relevant content.
  • Use tag filters to restrict context to the correct audience (e.g., { audience: "support" }).
  • Keep max_tokens under your provider's prompt budget. Monitor the prompt_tokens metric from your LLM provider dashboards.
  • If retrieval returns no matches, the orchestrator logs a warning and skips prompt injection rather than returning stale context.

Testing Retrieval Locally

compozy knowledge query --id quickstart_docs \
  --text "What is our support SLA?" \
  --top_k 3 \
  --min_score 0.25 \
  --output json

Inspect responses to ensure metadata and ETags align with expectations before binding knowledge to production workflows.