Collection Tasks
Collection tasks provide powerful iteration patterns for processing arrays and collections in Compozy workflows. They transform array data into parallel or sequential task executions, enabling efficient batch processing with sophisticated filtering, error handling, and result aggregation capabilities.
Overview
Collection tasks enable sophisticated array processing with enterprise-grade capabilities. They automatically transform input arrays into individual task executions, providing powerful orchestration patterns for batch operations.
Key Capabilities
Intelligent Processing Modes
Advanced Filtering
Batch Optimization
Rich Context Access
Failure Resilience
Result Aggregation
Task Structure
Basic Collection Task
id: process-users
mode: parallel
strategy: best_effort
# Source collection
items: "{{ .workflow.input.users }}"
# Optional filtering
filter: "{{ ne .item.status 'inactive' }}"
# Task template applied to each item
task:
id: "process-user-{{ .index }}"
$use: agent(local::agents.#(id=="user-processor"))
action: process_user
with:
user_id: "{{ .item.id }}"
user_data: "{{ .item }}"
processing_index: "{{ .index }}"
outputs:
processed_users: "{{ .output }}"
total_processed: "{{ len .output }}"
Configuration Options
Control how collection items are processed:
id: sequential-collection
mode: sequential
items: "{{ .workflow.input.documents }}"
task:
id: "process-doc-{{ .index }}"
$use: tool(local::tools.#(id=="document-processor"))
with:
document: "{{ .item }}"
sequence_number: "{{ .index }}"
id: parallel-collection
mode: parallel
strategy: wait_all
max_workers: 8
items: "{{ .workflow.input.images }}"
task:
id: "process-image-{{ .index }}"
$use: tool(local::tools.#(id=="image-processor"))
with:
image: "{{ .item }}"
parallel_index: "{{ .index }}"
id: batched-collection
mode: parallel
batch_size: 5
items: "{{ .workflow.input.records }}"
task:
id: "process-batch-{{ .batch_index }}"
$use: tool(local::tools.#(id=="batch-processor"))
with:
records: "{{ .batch }}"
batch_number: "{{ .batch_index }}"
Processing Patterns
Sequential Processing
Process items one after another when order matters or when each task depends on previous results:
When to Use Sequential Processing:
- Order-dependent operations (document processing, data transformations)
- Resource-constrained environments with limited parallel capacity
- Operations that build on previous results
- Rate-limited external APIs that require sequential calls
Parallel Processing
Process items concurrently when order doesn't matter and you need maximum throughput:
When to Use Parallel Processing:
- Independent operations that don't depend on each other
- CPU or I/O intensive tasks that benefit from concurrency
- Large datasets where processing time is a constraint
- Multiple API calls that can be made simultaneously
Performance Considerations:
- Use
max_workers
to control resource usage and prevent overwhelming external services - Consider memory usage when processing large items in parallel
- Monitor external API rate limits and adjust concurrency accordingly
Batch Processing
Process large datasets in manageable chunks to optimize memory usage and provide better control over resource consumption:
When to Use Batch Processing:
- Processing thousands or millions of items
- Memory-constrained environments
- External APIs with bulk operation support
- Database operations that benefit from batch inserts/updates
Batch Size Guidelines:
- Small batches (1-10): Real-time processing, low latency requirements
- Medium batches (10-100): Balanced performance and resource usage
- Large batches (100-1000+): Maximum throughput for bulk operations
Learn more about deployment options and advanced patterns.
Advanced Features
Conditional Processing
Best Practices
Choose the Right Processing Mode
Implement Early Filtering
Handle Partial Failures Gracefully
Optimize Batch Configurations
Monitor Performance Metrics
Set Realistic Timeouts
Performance Guidelines
Memory Management
Use batch processing for datasets larger than 1000 items
Monitor memory usage in production environments
Consider streaming patterns for extremely large datasets
Concurrency Control
Set
max_workers
based on external service limitsUse sequential mode for rate-limited APIs
Implement backoff strategies for failed requests
Error Recovery
Always use
best_effort
for non-critical operationsImplement proper logging for failed items
Consider retry mechanisms for transient failures
Next Steps
Collection tasks are the backbone of batch processing in Compozy, transforming arrays into sophisticated workflows with enterprise-grade reliability, performance, and flexibility. Master these patterns to build scalable data processing solutions that grow with your needs.
Aggregate Tasks
Aggregate tasks collect and combine results from multiple predecessor tasks, enabling sophisticated data transformation and result synthesis. They serve as collection points in workflows where outputs from different task branches need to be merged, calculated, or restructured.
Composite Tasks
Master sequential workflow orchestration with composite tasks that group related operations into logical, reusable units with sophisticated strategy control and error handling