Batch List Arguments and Dynamic Batch Expansion#
This guide explains how to use batch_list_arg to create dynamic batch processing
pipelines, where the number of batch items is determined at runtime based on outputs
from previous steps.
Overview#
Traditional batch processing requires pre-defined batch items (e.g., from a batch file). Batch list arguments enable a more dynamic pattern where:
An upstream step produces a variable number of outputs (e.g., splitting a file into N parts)
A batch step processes each output in parallel
A downstream step merges all results
This is the classic split-process-merge (or scatter-gather) pattern.
The batch_list_arg Type#
A batch_list_arg defines the batch items for a batch_snippet or batch_pipeline.
It supports three formats:
List of dicts (legacy): Explicitly list each batch item
Dict of lists (new): Lists are zipped together to create batch items
Dict with composite arguments (advanced): Dynamically generate batch items from snippet results
Format 1: List of Dicts (Legacy)#
Explicitly define each batch item:
--batch_input:
value:
- { "--input": "file1.txt", "--output": "out1.txt" }
- { "--input": "file2.txt", "--output": "out2.txt" }
- { "--input": "file3.txt", "--output": "out3.txt" }
type: batch_list_arg
This creates 3 batch items with fixed values.
Format 2: Dict of Lists (New)#
Provide lists that are automatically zipped together:
--batch_input:
value:
--input: ["file1.txt", "file2.txt", "file3.txt"]
--output: ["out1.txt", "out2.txt", "out3.txt"]
type: batch_list_arg
This is equivalent to Format 1 but more concise. Lists must have the same length.
Mixed scalar and list values:
--batch_input:
value:
--input: ["file1.txt", "file2.txt", "file3.txt"]
--output_dir: "/results" # Scalar: shared across all batch items
type: batch_list_arg
Results in:
[
{"--input": "file1.txt", "--output_dir": "/results"},
{"--input": "file2.txt", "--output_dir": "/results"},
{"--input": "file3.txt", "--output_dir": "/results"}
]
Format 3: Dict with Composite Arguments (Advanced)#
Dynamically generate batch items from upstream snippet results:
--batch_input:
value:
--input:
value:
snippet_name: split_file
result_key: file_split
result_arguments:
--input: "%(input_file)s"
--output: "%(output_folder)s"
-n: "%(n_split)i"
type: composite_arg
--output_dir: "%(output_folder)s"
type: batch_list_arg
Here, --input is a composite_arg that retrieves the list of split files
from the split_file snippet’s results. This list is then zipped with the
scalar --output_dir to create batch items dynamically.
Complete Example: Split-Process-Merge Pipeline#
This example demonstrates a complete pipeline that:
Splits an input file into N parts
Processes each part in parallel (batch)
Merges all processed results
Pipeline Definition:
info:
description: Split-Process-Merge Pipeline Example
date: 2025-01-01
api: 2.1.0
arguments:
input_file: Input file to process
output_folder: Directory for intermediate files
n_split: Number of parts to split the file into
output_merged: Final merged output file
steps:
# Step 1: Split the input file into N parts
step_split_file:
name: split_file
type: snippet
depends_on: []
arguments:
--input: "%(input_file)s"
--output: "%(output_folder)s"
-n: "%(n_split)s"
# Step 2: Process each split file in parallel (batch)
step_process_files:
name: process_file
type: batch_snippet
depends_on:
- step_split_file
arguments:
--output: "%(output_folder)s"
--batch_input:
value:
# Dynamic batch: --input comes from split_file results
--input:
value:
snippet_name: split_file
result_key: file_split
result_arguments:
--input: "%(input_file)s"
--output: "%(output_folder)s"
-n: "%(n_split)i"
type: composite_arg
type: batch_list_arg
# Step 3: Merge all processed files
step_merge_file:
name: merge_files
type: snippet
depends_on:
- step_process_files
arguments:
--input:
value:
snippet_name: process_file
result_key: output
result_arguments:
# Must match the batch arguments from step_process_files
--input:
value:
snippet_name: split_file
result_key: file_split
result_arguments:
--input: "%(input_file)s"
--output: "%(output_folder)s"
-n: "%(n_split)i"
type: composite_arg
--output: "%(output_folder)s"
type: batch_list_arg
type: composite_arg
--output: "%(output_merged)s"
How It Works:
step_split_filerunssplit_filesnippet which returns a list of file paths in itsfile_splitresult key (e.g.,["part_1.txt", "part_2.txt", "part_3.txt"])step_process_filesusesbatch_list_argwith an embeddedcomposite_arg:The
composite_argfetches the list fromsplit_file.results()The list is automatically expanded into batch items
Each batch item processes one split file
step_merge_filecollects all processed outputs:Uses
composite_argreferencingprocess_file.results()The
result_argumentsmust match exactly howprocess_filewas calledThe inner
composite_argreturns a list, which triggers automatic expansionprocess_file.results()is called for each expanded item, collecting all outputs
Note
In result_arguments, the type: batch_list_arg serves as documentation to indicate
the arguments form a batch pattern. The actual expansion happens because the inner
composite_arg returns a list value, which CompositeArgument automatically expands
using the same zip logic as BatchListArgument.
Running the Pipeline:
pype pipelines --queue slurm test_batch_list \
--input_file data.txt \
--output_folder /tmp/split_output \
--n_split 3 \
--output_merged /results/merged.txt
Required Snippet Results#
For this pattern to work, snippets must define appropriate results sections:
split_file snippet results:
## results
```python
@/usr/bin/env python3, json
import json
import os
output_dir = '%(output)s'
n = %(split)i
split_files = []
for i in range(0, n, 1):
split_files.append(os.path.join(output_dir, f"file_part_{i+1}.txt"))
res = {
'file_split': split_files # Returns a LIST of file paths
}
print(json.dumps(res))
```
process_file snippet results:
## results
```python
@/usr/bin/env python3, json
import json
import os
input_base = os.path.basename('%(input)s')
output_dir = '%(output)s'
res = {
'output': os.path.join(output_dir, f"{input_base}_processed.tsv")
}
print(json.dumps(res))
```
Key Concepts#
Automatic List Expansion:
When a composite_arg inside a batch_list_arg returns a list, each list item
becomes a separate batch execution. This is the core of dynamic batch expansion.
Result Arguments Matching:
When referencing batch results in downstream steps, the result_arguments must
include the batch_list_arg structure. This tells the system to collect results
from all batch items, not just one.
Internal Conversion:
Internally, all formats are converted to a list of dicts before processing:
# Input (dict-of-lists):
{"--input": ["a", "b"], "--output": "dir"}
# Converted to (list-of-dicts):
[{"--input": "a", "--output": "dir"},
{"--input": "b", "--output": "dir"}]
This unified format enables consistent batch processing regardless of input syntax.
Comparison with batch_file_arg#
Feature |
batch_file_arg |
batch_list_arg |
|---|---|---|
Source |
External TSV file |
Inline YAML or snippet results |
Dynamic count |
Fixed (rows in file) |
Yes (from composite_arg) |
Use case |
Pre-defined sample sheets |
Split-process-merge patterns |
Use batch_file_arg when batch items come from an external file (e.g., sample sheet).
Use batch_list_arg when batch items are defined inline or generated dynamically.
See Also#
Composite Arguments and Snippet Results Pattern for details on composite arguments
Pipelines for general pipeline documentation
Snippets for snippet results definitions