Pipelines¶
Pipelines combine multiple snippets into reproducible workflows. They are defined using YAML files that specify execution order, dependencies, and resource requirements.
Pipeline Structure¶
A pipeline YAML file has two main sections: - info: Contains metadata and documentation - items: Defines the workflow structure
Basic Example¶
info:
description: Simple FASTA processing pipeline
date: 2023-12-01
api: 2.0.0
arguments:
input_fa: Input FASTA file
output_fa: Output FASTA file
items:
- name: process_fasta
type: snippet
arguments:
- prefix: -i
pipeline_arg: "%(input_fa)s"
- prefix: -o
pipeline_arg: "%(output_fa)s"
Info Section¶
Required Fields¶
description: Brief explanation of pipeline purpose
api: API version (must match PIPELINES_API)
Optional Fields¶
arguments: Documentation for pipeline arguments
defaults: Default values for arguments
batches: Batch processing configurations
Example with all fields:
info:
description: Process multiple FASTA files
date: 2023-12-01
api: 2.0.0
arguments:
input_dir: Directory containing FASTA files
output_dir: Output directory for results
defaults:
threads: 4
quality: "high"
batches:
sample_sheet:
required: ["sample_id", "fasta_file"]
optional: ["quality"]
snippet: process_fasta
Pipeline Items¶
Item Types¶
snippet: Single task execution
pipeline: Nested pipeline execution
batch_snippet: Parallel snippet execution
batch_pipeline: Parallel pipeline execution
Arguments¶
Arguments connect inputs/outputs between steps:
arguments:
- prefix: "-i" # Command-line flag
pipeline_arg: "%(input_file)s" # Reference to pipeline argument
type: argv_arg # Argument type (default)
Available argument types: - argv_arg: Command-line argument - batch_file_arg: Arguments from batch file - batch_list_arg: Arguments from list - composite_arg: Arguments from snippet results - constant_arg: Fixed value
Dependencies¶
Control execution order and data flow:
items:
- name: step2
type: snippet
arguments:
- prefix: -i
pipeline_arg: "%(intermediate)s"
- prefix: -o
pipeline_arg: "%(output)s"
dependencies:
items:
- name: step1
type: snippet
arguments:
- prefix: -i
pipeline_arg: "%(input)s"
- prefix: -o
pipeline_arg: "%(intermediate)s"
Resource Management¶
Override snippet requirements:
items:
- name: intensive_step
type: snippet
requirements:
cpu: 8
mem: "16GB"
walltime: "12:00:00"
Running Pipelines¶
Basic execution:
pype pipelines my_pipeline --input input.fa --output output.fa
With specific queue:
pype pipelines --queue slurm my_pipeline --input input.fa --output output.fa
Batch processing:
pype pipelines my_pipeline --sample_sheet samples.tsv
Complete Example¶
Here’s a tested example combining multiple features:
info:
description: Reverse Complement Lower case a fasta
date: 01/10/2020
api: 2.0.0
items:
- name: lower_fa
type: snippet
arguments:
- prefix: -i
pipeline_arg: '%(complement_fa)s'
- prefix: -o
pipeline_arg: '%(output)s'
dependencies:
items:
- name: complement_fa
type: snippet
arguments:
- prefix: -i
pipeline_arg: '%(reverse_fa)s'
- prefix: -o
pipeline_arg: '%(complement_fa)s'
dependencies:
items:
- name: reverse_fa
type: snippet
arguments:
- prefix: -i
pipeline_arg: '%(input_fa)s'
- prefix: -o
pipeline_arg: '%(reverse_fa)s'
This pipeline: 1. Takes a FASTA file as input 2. Reverses the sequences 3. Creates complement sequences 4. Converts to lowercase 5. Demonstrates dependency management