Pipelines

Pipelines combine multiple snippets into reproducible workflows. They are defined using YAML files that specify execution order, dependencies, and resource requirements.

Pipeline Structure

A pipeline YAML file has two main sections: - info: Contains metadata and documentation - items: Defines the workflow structure

Basic Example

info:
  description: Simple FASTA processing pipeline
  date: 2023-12-01
  api: 2.0.0
  arguments:
    input_fa: Input FASTA file
    output_fa: Output FASTA file

items:
  - name: process_fasta
    type: snippet
    arguments:
      - prefix: -i
        pipeline_arg: "%(input_fa)s"
      - prefix: -o
        pipeline_arg: "%(output_fa)s"

Info Section

Required Fields

  • description: Brief explanation of pipeline purpose

  • api: API version (must match PIPELINES_API)

Optional Fields

  • arguments: Documentation for pipeline arguments

  • defaults: Default values for arguments

  • batches: Batch processing configurations

Example with all fields:

info:
  description: Process multiple FASTA files
  date: 2023-12-01
  api: 2.0.0
  arguments:
    input_dir: Directory containing FASTA files
    output_dir: Output directory for results
  defaults:
    threads: 4
    quality: "high"
  batches:
    sample_sheet:
      required: ["sample_id", "fasta_file"]
      optional: ["quality"]
      snippet: process_fasta

Pipeline Items

Item Types

  1. snippet: Single task execution

  2. pipeline: Nested pipeline execution

  3. batch_snippet: Parallel snippet execution

  4. batch_pipeline: Parallel pipeline execution

Arguments

Arguments connect inputs/outputs between steps:

arguments:
  - prefix: "-i"  # Command-line flag
    pipeline_arg: "%(input_file)s"  # Reference to pipeline argument
    type: argv_arg  # Argument type (default)

Available argument types: - argv_arg: Command-line argument - batch_file_arg: Arguments from batch file - batch_list_arg: Arguments from list - composite_arg: Arguments from snippet results - constant_arg: Fixed value

Dependencies

Control execution order and data flow:

items:
  - name: step2
    type: snippet
    arguments:
      - prefix: -i
        pipeline_arg: "%(intermediate)s"
      - prefix: -o
        pipeline_arg: "%(output)s"
    dependencies:
      items:
        - name: step1
          type: snippet
          arguments:
            - prefix: -i
              pipeline_arg: "%(input)s"
            - prefix: -o
              pipeline_arg: "%(intermediate)s"

Resource Management

Override snippet requirements:

items:
  - name: intensive_step
    type: snippet
    requirements:
      cpu: 8
      mem: "16GB"
      walltime: "12:00:00"

Running Pipelines

Basic execution:

pype pipelines my_pipeline --input input.fa --output output.fa

With specific queue:

pype pipelines --queue slurm my_pipeline --input input.fa --output output.fa

Batch processing:

pype pipelines my_pipeline --sample_sheet samples.tsv

Complete Example

Here’s a tested example combining multiple features:

Complete pipeline example
info:
   description: Reverse Complement Lower case a fasta
   date:        01/10/2020
   api: 2.0.0
items:
  - name: lower_fa
    type: snippet
    arguments:
      - prefix: -i
        pipeline_arg: '%(complement_fa)s'
      - prefix: -o
        pipeline_arg: '%(output)s'
    dependencies:
      items:
        - name: complement_fa
          type: snippet
          arguments:
            - prefix: -i
              pipeline_arg: '%(reverse_fa)s'
            - prefix: -o
              pipeline_arg: '%(complement_fa)s'
          dependencies:
            items:
              - name: reverse_fa
                type: snippet
                arguments:
                  - prefix: -i
                    pipeline_arg: '%(input_fa)s'
                  - prefix: -o
                    pipeline_arg: '%(reverse_fa)s'

This pipeline: 1. Takes a FASTA file as input 2. Reverses the sequences 3. Creates complement sequences 4. Converts to lowercase 5. Demonstrates dependency management