Pipelines¶
Writing new Pipelines:¶
The pipelines are YAML files located in a python module (mind the __init__.py in the folder containing the pipelines). The location of the module containing the pipelines can be controlled by the environment variable PYPE_PIPELINES.
The YAML file structure require a header field called info and the main field items with the execution information.
The info field expect the following keys:
- description: with a brief description of the pipelines
- date: with a string for tracking recent modification of the pipeline
- arguments: which is not a required key, but it can be used to add information and description of the argument required in the pipeline
The items keys is constructed combining multiple blocks in a hierarchical manner. The items are called PipelineItem.
- The PipelineItem are block of information linked to execution of one or more tasks.
- A PipelineItem can wrap a snippet, another pipeline or the batch execution of a snippets or pipeline
The input of a batch snippet/pipeline is defined by the argument ‘–input_batch’. And consists in in a list of predefined arguments or tab separated file in which each line defines a run of the snippet/pipeline and each column refers to an argument of the snippet/pipeline (column names defines the specific argument).
Possible attributes in a PipelineItem:
- Required:
- type: possible values: snippets, pipeline, batch_snippets or batch_pipeline
- name: name of the item to look for in the specified type category.
- arguments: specific arguments for the selected type/name function
- Optional:
- mute: if True the PipelineItem will not return any results, hence it will not be listed as a dependencies of parents processes
- dependencies: a list of other PipelineItem
- requirements: dictionary to alter the default in-snippets requirements (apply only for snippets and batch_snippets)
An example of pipelines using the Environment modules and the profile object:
This will results in the command line interface:
pype pipelines bwa_mem
error: argument --qc_out is required
usage: pype pipelines bwa_mem --qc_out QC_OUT --tmp_dir TMP_DIR --bam_out
BAM_OUT --fq1 FQ1 --fq2 FQ2 --header HEADER
optional arguments:
--qc_out QC_OUT Path to store the QC output, type: str
--tmp_dir TMP_DIR Temporary directory, type: str
--bam_out BAM_OUT Resulting bam file, type: str
--fq1 FQ1 First mate of fastq pairs, type: str
--fq2 FQ2 Second mate of fastq pairs, type: str
--header HEADER @RG group header, comma separated, type: str
Programmatic pipelines access:¶
It is also possible to programmatically launch a pipeline, within python console or within other python programs.
As for the snippets, the interface it is not well documented yet.
import os
import argparse
os.environ['PYPE_SNIPPETS'] = 'test/data/snippets'
os.environ['PYPE_PIPELINES'] = 'test/data/pipelines'
from pype.modules import pipelines
parser = argparse.ArgumentParser(prog='pype', description='Test')
subparsers = parser.add_subparsers(dest='modules')
input_fa = 'test/data/files/input.fa'
rev_fa = 'test/data/tmp/rev.fa'
compl_fa = 'test/data/tmp/rev_comp.fa'
out_fa = 'test/data/tmp/rev_comp_low.fa'
pipelines.pipelines(None, subparsers, None, [
'--queue', 'none', '--log', 'test/data/tmp',
'rev_compl_low_fa', '--input_fa', input_fa,
'--reverse_fa', rev_fa, '--complement_fa',
compl_fa, '--output', out_fa], 'default')