Knowledge

Knowledge Overview

Conceptual guide to embedders, vector databases, knowledge bases, and runtime retrieval in Compozy.

Why Knowledge Matters

Knowledge bases pair reusable embedders with pluggable vector databases so teams can index project documentation, policies, and reference material once and make it available to workflows, tasks, and agents. The runtime automatically resolves bindings, fetches relevant context, and injects it into prompts before any LLM call.

Reusable Infrastructure

Declare embedders, vector stores, and knowledge bases once and share them across projects.

Deterministic Retrieval

Dense similarity search with configurable top_k, score thresholds, and tag filters.

Safe Prompt Injection

Resolved passages are bounded by token budgets and appended to prompts in a predictable order.

Core Concepts

1

Embedder

Wraps a provider-specific embedding model (OpenAI, Vertex, self-hosted) via LangChain Go. Configured once and referenced by ID.

2

Vector Database

Persists embeddings and metadata. The MVP ships with in_memory, pgvector, and qdrant adapters; additional stores can be added without touching workflows.

3

Knowledge Base

Couples an embedder and vector DB with one or more sources, chunking policy, and retrieval defaults.

4

Knowledge Binding

Workflow, task, or agent level reference to a knowledge base. Bindings can override retrieval parameters per execution.

Runtime Flow

  1. Resolution – When a workflow starts, bindings are merged in precedence order (workflow → project → inline). Conflicts resolve deterministically so agents always see the same configuration.
  2. Retrieval – The knowledge service loads the selected embedder/vector store, issues a dense similarity search, and applies any filters.
  3. Prompt Assembly – Retrieved passages are truncated to stay within the configured token budget and appended before tool or memory context.

Relationship to Other Systems

  • Memory – Memory persists user conversations or application state. Knowledge focuses on curated references that change less frequently. You can use both: memory handles per-user context, knowledge supplies canonical documents.
  • Tools – Tools still execute dynamic actions (HTTP calls, SQL). Use knowledge when you need contextual grounding before issuing tool calls.
  • MCP Servers – Model Context Protocol servers expose dynamic tooling. Knowledge is built-in and does not require MCP, but bindings can reference MCP-powered agents for richer orchestration.

Personas & Scenarios

  • Platform engineers provision the embedder/vector DB infrastructure and enforce conventions.
  • Workflow authors bind knowledge bases and tune retrieval parameters without touching provider credentials.
  • Operators trigger ingestion via CLI or API and monitor metrics for failures and latency.
  • LLM engineers reason about token budgets and combine knowledge context with tools or memory.

Explore runnable projects in examples/knowledge/* for end-to-end walkthroughs, starting with the Quickstart Markdown Glob example.