Pipelines

Writing new Pipelines:

The pipelines are YAML files located in a python module (mind the __init__.py in the folder containing the pipelines). The location of the module containing the pipelines can be controlled by the environment variable PYPE_PIPELINES.

The YAML file structure require a header field called info and the main field items with the execution information.

The info field expect the following keys:

  1. description: with a brief description of the pipelines
  2. date: with a string for tracking recent modification of the pipeline
  3. arguments: which is not a required key, but it can be used to add information and description of the argument required in the pipeline

The items keys is constructed combining multiple blocks in a hierarchical manner. The items are called PipelineItem.

  • The PipelineItem are block of information linked to execution of one or more tasks.
  • A PipelineItem can wrap a snippet, another pipeline or the batch execution of a snippets or pipeline

The input of a batch snippet/pipeline is defined by the argument ‘–input_batch’. And consists in in a list of predefined arguments or tab separated file in which each line defines a run of the snippet/pipeline and each column refers to an argument of the snippet/pipeline (column names defines the specific argument).

Possible attributes in a PipelineItem:

  • Required:
    • type: possible values: snippets, pipeline, batch_snippets or batch_pipeline
    • name: name of the item to look for in the specified type category.
    • arguments: specific arguments for the selected type/name function
  • Optional:
    • mute: if True the PipelineItem will not return any results, hence it will not be listed as a dependencies of parents processes
    • dependencies: a list of other PipelineItem
    • requirements: dictionary to alter the default in-snippets requirements (apply only for snippets and batch_snippets)

An example of pipelines using the Environment modules and the profile object:

This will results in the command line interface:

pype pipelines bwa_mem

error: argument --qc_out is required
usage: pype pipelines bwa_mem --qc_out QC_OUT --tmp_dir TMP_DIR --bam_out
                              BAM_OUT --fq1 FQ1 --fq2 FQ2 --header HEADER

optional arguments:
  --qc_out QC_OUT    Path to store the QC output, type: str
  --tmp_dir TMP_DIR  Temporary directory, type: str
  --bam_out BAM_OUT  Resulting bam file, type: str
  --fq1 FQ1          First mate of fastq pairs, type: str
  --fq2 FQ2          Second mate of fastq pairs, type: str
  --header HEADER    @RG group header, comma separated, type: str

Programmatic pipelines access:

It is also possible to programmatically launch a pipeline, within python console or within other python programs.

As for the snippets, the interface it is not well documented yet.

import os
import argparse
os.environ['PYPE_SNIPPETS'] = 'test/data/snippets'
os.environ['PYPE_PIPELINES'] = 'test/data/pipelines'

from pype.modules import pipelines

parser = argparse.ArgumentParser(prog='pype', description='Test')
subparsers = parser.add_subparsers(dest='modules')

input_fa = 'test/data/files/input.fa'
rev_fa = 'test/data/tmp/rev.fa'
compl_fa = 'test/data/tmp/rev_comp.fa'
out_fa = 'test/data/tmp/rev_comp_low.fa'
pipelines.pipelines(None, subparsers, None, [
          '--queue', 'none', '--log', 'test/data/tmp',
          'rev_compl_low_fa', '--input_fa', input_fa,
          '--reverse_fa', rev_fa, '--complement_fa',
          compl_fa, '--output', out_fa], 'default')