Collection Tasks

Overview

Collection tasks enable sophisticated array processing with enterprise-grade capabilities. They automatically transform input arrays into individual task executions, providing powerful orchestration patterns for batch operations.

Loading diagram...

Key Capabilities

Intelligent Processing Modes

Choose between sequential ordering and parallel performance based on your workflow requirements

Advanced Filtering

Use CEL expressions to process only items that meet specific conditions, reducing unnecessary processing

Batch Optimization

Process large datasets efficiently with configurable batch sizes and memory management

Rich Context Access

Access current item, processing index, collection metadata, and parent workflow context

Failure Resilience

Sophisticated error handling with partial results, best-effort strategies, and graceful degradation

Result Aggregation

Automatically collect and organize results from all iterations with template-based transformations

Collection tasks are essential for data processing pipelines, user batch operations, file transformations, and any scenario requiring the same operation across multiple items. They bridge the gap between single-item processing and large-scale batch operations.

Task Structure

Basic Collection Task

id: process-users
type: collection
mode: parallel
strategy: best_effort

# Source collection
items: "{{ .workflow.input.users }}"

# Optional filtering
filter: "{{ ne .item.status 'inactive' }}"

# Task template applied to each item
task:
  id: "process-user-{{ .index }}"
  agent: user-processor
  action: process_user
  with:
    user_id: "{{ .item.id }}"
    user_data: "{{ .item }}"
    processing_index: "{{ .index }}"

outputs:
  processed_users: "{{ .output }}"
  total_processed: "{{ len .output }}"

Configuration Options

Control how collection items are processed:

Sequential Processing

id: sequential-collection
type: collection
mode: sequential
items: "{{ .workflow.input.documents }}"

task:
  id: "process-doc-{{ .index }}"
  tool: document-processor
  with:
    document: "{{ .item }}"
    sequence_number: "{{ .index }}"

Parallel Processing

id: parallel-collection
type: collection
mode: parallel
strategy: wait_all
max_workers: 8
items: "{{ .workflow.input.images }}"

task:
  id: "process-image-{{ .index }}"
  tool: image-processor
  with:
    image: "{{ .item }}"
    parallel_index: "{{ .index }}"

Batched Processing

id: batched-collection
type: collection
mode: parallel
batch_size: 5
items: "{{ .workflow.input.records }}"

task:
  id: "process-batch-{{ .batch_index }}"
  tool: batch-processor
  with:
    records: "{{ .batch }}"
    batch_number: "{{ .batch_index }}"

Processing Patterns

Understanding when to use each processing pattern is crucial for optimal performance. Learn more about advanced patterns and template expressions to make the best choice for your use case.

Sequential Processing

Process items one after another when order matters or when each task depends on previous results:

When to Use Sequential Processing:

Order-dependent operations (document processing, data transformations)
Resource-constrained environments with limited parallel capacity
Operations that build on previous results
Rate-limited external APIs that require sequential calls

Parallel Processing

Process items concurrently when order doesn't matter and you need maximum throughput:

When to Use Parallel Processing:

Independent operations that don't depend on each other
CPU or I/O intensive tasks that benefit from concurrency
Large datasets where processing time is a constraint
Multiple API calls that can be made simultaneously

Performance Considerations:

Use max_workers to control resource usage and prevent overwhelming external services
Consider memory usage when processing large items in parallel
Monitor external API rate limits and adjust concurrency accordingly

Batch Processing

Process large datasets in manageable chunks to optimize memory usage and provide better control over resource consumption:

When to Use Batch Processing:

Processing thousands or millions of items
Memory-constrained environments
External APIs with bulk operation support
Database operations that benefit from batch inserts/updates

Batch Size Guidelines:

Small batches (1-10): Real-time processing, low latency requirements
Medium batches (10-100): Balanced performance and resource usage
Large batches (100-1000+): Maximum throughput for bulk operations

Learn more about deployment options and advanced patterns.

Advanced Features

Advanced collection features require a solid understanding of CEL expressions and template context variables. Review these concepts before implementing complex filtering and error handling patterns.

Conditional Processing

Best Practices

Choose the Right Processing Mode

Use sequential for order-dependent processing, parallel for independent operations, and batch for large datasets

Implement Early Filtering

Filter items before processing to reduce load and improve performance using CEL expressions

Handle Partial Failures Gracefully

Use best_effort strategy for non-critical operations and implement proper error handling

Optimize Batch Configurations

Balance memory usage with processing efficiency by choosing appropriate batch sizes

Monitor Performance Metrics

Track processing times, success rates, and resource usage to optimize collection performance

Set Realistic Timeouts

Configure appropriate timeouts for individual items and overall collection operations

Performance Guidelines

Performance Optimization

For production deployments, consider implementing monitoring and observability to track collection task performance and identify optimization opportunities.

Memory Management

Use batch processing for datasets larger than 1000 items
Monitor memory usage in production environments
Consider streaming patterns for extremely large datasets

Concurrency Control

Set max_workers based on external service limits
Use sequential mode for rate-limited APIs
Implement backoff strategies for failed requests

Error Recovery

Always use best_effort for non-critical operations
Implement proper logging for failed items
Consider retry mechanisms for transient failures

Next Steps

Basic Tasks Foundation

Master the foundation tasks used within collection task templates

First Workflow Tutorial

Build your first workflow with collection processing examples

Template Basics

Learn template expressions for dynamic collection configuration

Agent Integration

Connect AI agents to process collection items intelligently

Collection tasks are the backbone of batch processing in Compozy, transforming arrays into sophisticated workflows with enterprise-grade reliability, performance, and flexibility. Master these patterns to build scalable data processing solutions that grow with your needs.