Knowledge

Knowledge Configuration

Define embedders, vector databases, knowledge bases, and bindings in project and workflow YAML.

Where Configuration Lives

Compozy loads knowledge resources from the same declarative YAML used across the platform:

  1. Project YAML – Declares embedders, vector databases, and knowledge bases under top-level arrays.
  2. Workflow YAML – Defines workflow-scoped knowledge bases or binds to project resources.
  3. Task & Agent YAML – Override retrieval parameters or select a different knowledge base per execution context.

All runtime code accesses configuration through config.FromContext(ctx); never rely on global singletons.

Declaring Embedders

compozy.yaml
embedders:
  - id: openai_default
    provider: openai
    model: text-embedding-3-small
    api_key: "{{ .env.OPENAI_API_KEY }}"
    config:
      batch_size: 64 # optional, overrides config.knowledge.embedder_batch_size
      timeout: 30s
  • provider must match a LangChain Go embedding client.
  • Secrets are always supplied through env interpolation—never commit keys to source control.
  • Default batch size, chunk size, and retrieval knobs fall back to config.knowledge.* when omitted.

Declaring Vector Databases

compozy.yaml
vector_dbs:
  - id: filesystem_faststart
    type: filesystem
  - id: pgvector_local
    type: pgvector
    config:
      dsn: "{{ .env.PGVECTOR_DSN }}"
      ensure_index: true
      table: knowledge_chunks
  • Use filesystem for quick starts and tests. Production workloads should use pgvector or qdrant.
  • Set ensure_index to automatically create the IVF/IVFFlat index; disable if migrations manage it elsewhere.
  • Keep DSNs inside environment variables to avoid leaking credentials.

Defining Knowledge Bases

compozy.yaml
knowledge_bases:
  - id: quickstart_docs
    embedder: openai_default
    vector_db: filesystem_faststart
    sources:
      - kind: markdown_glob
        glob: "docs/**/*.md"
    chunking:
      strategy: token
      size: 512   # falls back to config.knowledge.chunk_size when omitted
      overlap: 64 # falls back to config.knowledge.chunk_overlap when omitted
    retrieval:
      top_k: 5        # default k
      min_score: 0.2  # discard low-signal matches
      filters:
        tag: "policy" # optional exact-match tag filters

Each knowledge base references exactly one embedder and vector database. Supported sources in the MVP:

  • markdown_glob – Glob pattern relative to the project root.
  • url – HTTPS URL to a small, publicly reachable document (PDF, HTML, Markdown, etc.).
  • cloud_storage and media_transcript – Support for these sources is planned for a future release. Surface them in user docs only if your deployment enables the feature flag.

Binding Knowledge in Workflows, Tasks, and Agents

workflows/qa.yaml
id: qa
knowledge:
  id: quickstart_docs
  retrieval:
    top_k: 3
    min_score: 0.15
tasks:
  - id: answer
    type: basic
    agent: qa_agent
    knowledge:
      id: quickstart_docs
      retrieval:
        top_k: 4
agents:
  - id: qa_agent
    model: groq/llama-3.1-8b-instruct
    instructions: ./prompts/qa.md
    knowledge:
      id: quickstart_docs

Precedence Rules

1

Workflow binding

Defines the default knowledge base and retrieval parameters for all tasks and agents in the workflow.

2

Task override

Tasks can override the knowledge base ID or retrieval parameters for fine-grained control.

3

Agent override

Agent-level overrides apply when the agent executes outside task context (e.g., agent CLI).

Bindings inherit missing fields from their parent scope. The MVP supports a single knowledge binding per scope; multi-binding fanout is on the roadmap.

Autoload, Import, and Export

Knowledge resources participate in the standard resource workflow:

  • compozy knowledge apply writes definitions to the runtime store.
  • compozy knowledge export and compozy knowledge get --output yaml round-trip configuration for review.
  • Autoloaders treat knowledge_bases, embedders, and vector_dbs like other resource groups—store them alongside existing YAML.