BIO Pype¶
A lightweight python framework to organize bioinformatics scripts and analyses.
About¶
Much like other pipeline/workflow managers, bio_pype offers improvements in scalability, reproducibility while also simplifying the daily use of bioinformatic analyses.
Bio_pype provides a modular framework for building analysis pipelines. Three major components define the behavior of the system: Snippets, Pipelines and Profiles.
Snippets
Snippets represent the basic unit of the pipeline. Snippets essentially act as wrappers that are responsible for packaging command-line arguments, performing any required preprocessing, and executing the target tool. Most importantly, they provide a uniform interface that is used by bio_pype to concatenate analysis steps into a pipeline.
Pipelines
A pipeline is a sequential series of steps required to execute a multi-stage analysis. In bio_pype, a pipeline is defined using a YAML file that outlines the order and combination of snippets required.
Profiles
Profiles provide a list of dependencies available to the snippets and pipelines. Information such as the location of genomes, annotation databases, and applications are contained within the profile file. Profiles provide a mechanism for switching between versions of a pipeline that utilise different dependencies while still maintaining the structure of the pipeline.
Get started¶
Requirements¶
Bio-pype is a python package available from the python package index. So Python is required to use the software. Only Python3 (Python >= 3.4) is supported.
Note
Earlier version of bio-pype supported python2.7, however after the official sunsetting of python2 in 2020 and the increasing divergence of the legacy python2 from python3, bio-pype will support only python3 from version 2.0.0
Installation¶
Note
It is strongly advised to use virtualenv to install the module locally.
From pip:¶
Tip
The installation from pip may not include the latest fixes/features but it is generally a good choice for a production environment.
pip install bio_pype
From git:¶
git clone https://bitbucket.org/ffavero/bio_pype
cd bio_pype
python -m unittest discover
python setup.py install
Running tasks in a Snippet¶
You need to configure bio_pype to match your system setting (see Configuration).
Additionally, in most cases, the profiles files section needs to be adjusted to match your system file structure. See Profiles for details.
In short, something like the following highlighted section shows a minimal setup to use pype in the example code for this documentation:
Minimal pype setup
In this documentation we set the root location for the pype modules to the repository test data:
$ mkdir ~/.bio_pype
$ echo "PYPE_MODULES=`dirname $PWD`/test/data" > ~/.bio_pype/config
This results in the config file:
$ cat ~/.bio_pype/config
PYPE_MODULES=/home/docs/checkouts/readthedocs.org/user_builds/bio-pype/checkouts/latest/test/data
Additionally, we also need to replace the dummy_file value in the profile files to match the path of test/data/files/dummy_file.txt in the repository test data:
$ sed -i "s,/just/a/dummy/file/for/testing.txt,`dirname $PWD`/test/data/files/dummy_file.txt," ../test/data/profiles/test_path.yaml
$ sed -i "s,/just/a/dummy/file/for/testing.txt,`dirname $PWD`/test/data/files/dummy_file.txt," ../test/data/profiles/test_docker.yaml
After the configuration of pype with the desired modules folders, copy the following lines as a file named test_base.md (or any other name. The final snippet name corresponds to the file name -minus the .md extension-) in the snippet folder.
# Example test Snippet
The snippet in pype is given by the file name
(minus the `.md` extension)
## description
Test snippet example
## requirements
```yaml
ncpu: 1
time: '00:01:00'
mem: 1gb
```
## results
```bash
@/bin/sh, yaml
printf 'file_out: %(output)s'
```
## arguments
1. input/i
- help: input(s) text file
- type: str
- required: true
- nargs: *
2. output/o
- help: output file
- type: str
- default: output.txt
## snippet
> _input_: input profile_dummy_file*
```bash
@/bin/sh, chk1, stdout=chk2, namespace=alpine_3
files_input='%(input)s'
dummy_file='%(profile_dummy_file)s'
cat $files_input $dummy_file | awk '{ print toupper($0) }'
```
> _output_: results_file_out
```bash
@/bin/sh, chk2, namespace=alpine_3
awk '{ print tolower($0) }' > '%(output)s'
```
For more detailed information on how to write snippets and on their structure see Snippets
The snippets are run via the pype command line:
$ pype
usage: pype [-p PROFILE] {pipelines,profiles,repos,snippets} ...
A python pipeliens manager oriented for bioinformatics
positional arguments:
pipelines Workflows built combining pipelines and snippets
profiles Reference paths and softwares to use in snippets
repos Manage pype modules
snippets Execute tasks
optional arguments:
-p PROFILE, --profile PROFILE
Choose the pype profile from the available options (see
pype profiles). Default: test_docker
This is version 1.9.99 - Francesco Favero - 12 December 2020
Using the snippets sub-command
$ pype snippets
usage: pype snippets [--log LOG]
{complement_fa,hello,lower_fa,merge_fa,reverse_fa,test_adv,test_base}
...
positional arguments:
{complement_fa,hello,lower_fa,merge_fa,reverse_fa,test_adv,test_base}
complement_fa lower case a fasta sequence
hello hello world with flag
lower_fa lower case a fasta sequence
merge_fa Concatenate a series of files into a single one
reverse_fa reverse a fasta sequence
test_adv Test snippet example -in python-
test_base Test snippet example
optional arguments:
--log LOG Path used for the snippet logs. Default:
/home/docs/.bio_pype/logs
The selected snippet will prompt the command line interface according to the arguments section in the snippet:
$ pype snippets test_base
INFO : 2021-02-02 12:28:40,906 : Writing logs to folder /home/docs/.bio_pype/logs/test_base
INFO : 2021-02-02 12:28:40,907 : Using profile test_docker
INFO : 2021-02-02 12:28:40,907 : Prepare snippet test_base
INFO : 2021-02-02 12:28:40,907 : Attempt to execute snippet test_base
error: the following arguments are required: --input/-i
usage: pype snippets test_base --input [INPUT [INPUT ...]] [--output OUTPUT]
optional arguments:
--input [INPUT [INPUT ...]], -i [INPUT [INPUT ...]]
input(s) text file
--output OUTPUT, -o OUTPUT
output file
Now that we know how to use it, we may run the snippets using any text file available at hand:
$ pype -p test_path snippets test_base -i ../test/data/files/input.fa
INFO : 2021-02-02 12:28:41,139 : Writing logs to folder /home/docs/.bio_pype/logs/test_base
INFO : 2021-02-02 12:28:41,139 : Using profile test_path
INFO : 2021-02-02 12:28:41,139 : Prepare snippet test_base
INFO : 2021-02-02 12:28:41,139 : Attempt to execute snippet test_base
INFO : 2021-02-02 12:28:41,144 : Write chunk chk1 code into /home/docs/.bio_pype/logs/test_base/210202122841.139951_SD2E_test_base_chk1
INFO : 2021-02-02 12:28:41,145 : Set namespace to path
INFO : 2021-02-02 12:28:41,145 : Write chunk chk2 code into /home/docs/.bio_pype/logs/test_base/210202122841.140033_7QT8_test_base_chk2
INFO : 2021-02-02 12:28:41,145 : Set namespace to path
INFO : 2021-02-02 12:28:41,145 : Pipe in chk1 in chk2 command
INFO : 2021-02-02 12:28:41,146 : Prepare chk1 command line
INFO : 2021-02-02 12:28:41,146 : /home/docs/.bio_pype/logs/test_base/210202122841.139951_SD2E_test_base_chk1
INFO : 2021-02-02 12:28:41,146 : Execute chk1 with python subprocess.Popen
INFO : 2021-02-02 12:28:41,147 : Prepare chk2 command line
INFO : 2021-02-02 12:28:41,147 : /home/docs/.bio_pype/logs/test_base/210202122841.140033_7QT8_test_base_chk2
INFO : 2021-02-02 12:28:41,147 : Execute chk2 with python subprocess.Popen
INFO : 2021-02-02 12:28:41,149 : Close chk1 stdout stream
INFO : 2021-02-02 12:28:41,152 : Terminate chk2
INFO : 2021-02-02 12:28:41,152 : Snippet test_base executed
inspecting the output:
$ tail -n 2 output.txt
cggacacccagaagtctacatcctaattctc
this is a dummy file!
how changed from the original file:
$ tail -n 2 ../test/data/files/input.fa
TCAACACCACCTTCTTTGACCCAGCAGGAGGAGGAGACCCAGTACTATACCAGCACCTATTCTGATTCTT
CGGACACCCAGAAGTCTACATCCTAATTCTC
This was an amazingly useless task, but the snippet showcased the use of the profiles, enabling to reuse the same code in different environments (eg using -p test_docker instructs to use docker to run the code in the snippet chunks), abstracting technical amenities.
Now that you can run a simple task using profiles and snippets, you can chain tasks to form complex dependencies using Pipelines.
Configuration¶
There are various aspect of pype that can be configured. Notably the installation paths of the modules are the most common variables that needs to be set.
By default the pype modules (snippets, pipelines, profiles and queues) are installed in the python library installation, within the package site-packages folder.
Retrieving the modules in the installation folder can made unnecessarily cumbersome to add/edit/change modules.
Configuring different paths for the modules, it also makes possible to easily switch from a set of modules to another (eg from a “stable” to a “development” set of modules).
The set of configuration variables are listed in the section “Available variables”
Local configuration file¶
When a configuration file in ~/.bio_pype/config exists, the program will read the configuration to set the variables.
The configuration file looks like this:
PYPE_TMP=/tmp
PYPE_LOGDIR=/tmp/logs
Environment variable¶
Variables can also be set as environment variables. Setting a variable in the environment will override the corresponding variable if also set in the configuration file.
Available variables¶
Variable | Description |
---|---|
PYPE_MODULES | Sets the path of pipelines, profiles, queues and snippets at a prefix with the specified folder. It also overrides the path variable for each separate module. |
PYPE_SNIPPETS | Sets the path for the snippets module to the specified folder |
PYPE_PROFILES | Seta the path for the profiles module to the specified folder |
PYPE_PIPELINES | Sets the path for the pipelines module to the specified folder |
PYPE_QUEUES | Sets the path for the queues module to the specified folder |
PYPE_REPOS | Sets the path for the repos.yaml file to read |
PYPE_NCPU | It can be used to set the number of maximum CPUs to use when launching jobs in parallel without using an external scheduler. |
PYPE_MEM | It can be used to set the maximum amount of memory to use, when launching jobs in parallel without using an external scheduler |
PYPE_TMP | Sets a temporary folder that can be used in snippets using the ‘%(pype_tmp)s’ tag (or by loading __config__.PYPE_CONFIG in python snippets) |
PYPE_LOGDIR | Sets the path for the log files. By default is set to`~/.bio_pype/logs` |
PYPE_DOCKER | Sets the binary to launch docker or other container-based solution. The default value is docker. I also accepts full path of the binary. It support also udocker and singularity |
PYPE_SINGULARITY_CACHE | Sets the path for the singularity sif images. There is no default path for this variable |
Available modules:¶
Browse and install available modules¶
There are available set of snippets and pipelines. You can access the list of repository and manage the installed modules with the repos sub command
$ pype repos
usage: pype repos [-r REPO_LIST] {list,install,init,clean,info} ...
positional arguments:
{list,install,init,clean,info}
list List the available repositories
install Install modules from selected repository
init Initiate an empty repository
clean Cleanup all module folders
info Print location of the modules currently in use
optional arguments:
-r REPO_LIST, --repo REPO_LIST
Repository list. Default:
/home/docs/checkouts/readthedocs.org/user_builds/bio-
pype/envs/latest/lib/python3.7/site-packages/bio_pype-
1.9.99-py3.7.egg/pype/pype_modules/repos.yaml
The currently available sets are:
$ pype repos list
- gatk4
Pype modules following the gatk4 best practice
homepage: https://bitbucket.org/ffavero/pype_modules
source: https://bitbucket.org/ffavero/pype_modules/get/master.tar.gz
- sequenza
Pype modules to run sequenza
homepage: https://bitbucket.org/sequenzatools/sequenza_pype_modules
source: https://bitbucket.org/sequenzatools/sequenza_pype_modules/get/master.tar.gz
- weischenfeldt
Pype modules for the Weischenfeldt group tools in computerome
homepage: https://bitbucket.org/weischenfeldt/pype_weischenfeldt_computerome
source: https://bitbucket.org/weischenfeldt/pype_weischenfeldt_computerome/get/master.tar.gz
It’s possible to install the modules from a repository in the list, by invoking the pype repos install with the selected repository.
Snippets¶
A snippet is the executor of the tasks. It can be written as a markdown file, using code chunks to run arbitrary code, or it can be written as a python modules (see Advanced Snippets in Python)
Basic Snippet Structure¶
A full snippet has been shown already in the Running tasks in a Snippet section.
The structure of a snippet in composed by the following sections header:
- requirements: which include a code chunk returning a dictionary which specifies the necessary resource to run the snippet (eg. used to allocate resource in a queuing systems)
- results: which include a code chunk returning a dictionary listing all the files produced by the execution of the snippet
- arguments: a numbered list, which is interpreted by
argparse
to produce the command line interface of the snippet- snippet: containing the code chunks with the instruction to perform the desired task
- name: an optional section containing a chunk returning a string with a “friendly name”. This name overrides in certain aspects the default snippet name. This can be used to identify more easily log folders and job ids running on the system.
The input and output arguments are passed to the various chunk via variable substitutions by name, a method used in python strings formatting.
In practice it means that a string %(hello)s present in a chunk, would be replace by the value of the variable hello
There are few ways of setting variables:
- The arguments section
- The profiles.files (See Profiles)
- The keys from the results object
The arguments variables are named after the argument name, and the value is the value passed to the the command line.
The variables from the profile and from the results section are prefixed with profile_ and results_ respectively. This means that in order to pass a key, eg. genome_fa, present in the profile.file, in the snippet chunk it corresponds to %(profile_genome_fa)s.
More detail on the argument passing in the following section
Reference arguments results and files¶
Using Namespaces¶
<<This section may go to Profiles>> The namespace are set in the profile file. Ideally the snippet should be agnostic on the final runtime execution, and it may be possible to run it as-is in different environment by only change the namespace in the profile.
More broadly the namespace is a mechanism to set the environment to where execute the chunk.
Supported namespace are:
- Path: assumes that the commands in the chunks are present in the environment $PATH
- Environment Modules: loads a set of specified modules before running the chunk
- Docker: run the chunk within a container image. This namespace supports also uDocker and singularity
Path¶
Environment Modules¶
Docker/Singularity/uDocker¶
Advanced Snippets in Python¶
The snippets are located in a python module (mind the __init__.py in the folder containing the snippets). In order to function, each snippet need to have 4 specific function:
- requirements: a function returning a dictionary with the necessary resource to run the snippet (used to allocate resource in queuing systems)
- results: a function accepting a dictionary with the snippet arguments and returning a dictionary listing all the files produced by the execution of the snippet
- add_parser: a function that implement the
argparse
module and defines the command line arguments accepted by the snippet- a function named as the snippet file name (without the .py extension), containing the code for the execution of the tool
from pype.process import Command
def requirements():
return({
'ncpu': 1,
'time': '00:01:00',
'mem': '1gb'})
def results(argv):
output = None
try:
output = argv['-o']
except KeyError:
try:
output = argv['--output']
except KeyError as e:
raise e
return({'file_out': output})
def add_parser(subparsers, module_name):
return subparsers.add_parser(
module_name, help='Test snippet example -in python-',
add_help=False)
def test_adv_args(parser, argv):
parser.add_argument(
'-i', '--input', dest='input', nargs='*',
help='input(s) text file', type=str, required=True)
parser.add_argument(
'-o', '--output', dest='output', type=str,
default='output.txt', help='output file')
return parser.parse_args(argv)
def test_adv(subparsers, module_name, argv, profile, log):
args = test_adv_args(
add_parser(subparsers, module_name), argv)
dummy_file = profile.files['dummy_file']
cmd1 = 'cat %s %s' % (
' '.join(args.input), dummy_file)
cmd2 = 'awk \'{ print toupper($0) }\''
cmd3 = 'awk \'{ print tolower($0) }\''
cat = Command(
cmd1, log, profile, 'cat')
to_up = Command(
cmd2, log, profile, 'to_up')
to_low = Command(
cmd3, log, profile, 'too_low')
for input_file in args.input:
cat.add_input(input_file)
cat.add_input(dummy_file)
to_low.add_output(args.output)
cat.add_namespace(profile.programs['alpine_3'])
to_up.add_namespace(profile.programs['alpine_3'])
to_low.add_namespace(profile.programs['alpine_3'])
to_up.pipe_in(cat)
to_low.pipe_in(to_up)
with open(args.output, 'wt') as output:
to_low.run()
for line in to_low.stdout:
output.write(line.decode('utf-8'))
to_low.close()
Pipelines¶
Structure of Pipelines files:¶
The pipelines are YAML files located in the pipelines folder (see Configuration).
A pipeline YAML file is structured in two sections defined by the keys info and items.
Info header¶
The info section contains the following sub-keys
keys | values |
---|---|
description | A quick description of the pipeline |
date | The date of writing/editing |
arguments | An optional key containing information used to customize the description of the pipeline arguments |
Pipeline items¶
The items section contains a hierarchical structure, constructed by combining multiple items.
Each element is composed by the following keys:
key | value |
---|---|
name | The name of the pipeline or the snippet which the item represent |
type | Indicate if the item is a snippet or a pipeline. The value can be any between the choices: snippet, pipeline, batch_snippet or batch_pipeline |
arguments | The list of arguments of the pipeline/snippets with the template variables used. See Pipeline Item Arguments section |
dependencies | Optional, Indicate the start of a child items structure constructing a hierarchical dependency between snippets |
requirements | Optional, a dictionary used to alter the default`requirements` defined in the snippet (it applies only for snippets and batch_snippets) |
mute | Optional, if set and its value is true, the item will not pass any dependency on the parent items |
Pipeline Item Arguments¶
Profiles¶
Extending profiles:¶
Similarly to the pipelines, the profiles are YAML files in a python module (it is requires a __init__.py file in the folder containing the files). In order to be parsed correctly, the profile YAML file needs to have a specific structure:
info:
description: Test Profile
date: 23/11/2020
genome_build: hg38
files:
genome_fa: /abs/path/to/fasta.fa
genome_fa_gz: /abs/path/to/fasta.fa.gz
dummy_file: /home/docs/checkouts/readthedocs.org/user_builds/bio-pype/checkouts/latest/test/data/files/dummy_file.txt
programs:
samtools_0:
namespace: path@samtools
version: 0.1.19
samtools_1:
namespace: path@samtools
version: 1.2
bwa:
namespace: path@bwa
version: 0.7.10
star:
namespace: path@star
version: 2.5.1b
alpine_3:
namespace: path@alpine
version: latest
info:
description: Test Profile with Docker Namespace
date: 23/11/2020
genome_build: hg38
files:
genome_fa: /abs/path/to/fasta.fa
genome_fa_gz: /abs/path/to/fasta.fa.gz
dummy_file: /home/docs/checkouts/readthedocs.org/user_builds/bio-pype/checkouts/latest/test/data/files/dummy_file.txt
programs:
samtools_0:
namespace: path@samtools
version: 0.1.19
samtools_1:
namespace: docker@biocontainers/samtools
version: v1.7.0_cv4
bwa:
namespace: docker@biocontainers/bwa
version: v0.7.15_cv4
star:
namespace: path@star
version: 2.5.1b
alpine_3:
namespace: docker@alpine
version: latest
API¶
Utils¶
Pipelines¶
-
class
pype.utils.arguments.
BatchFileArgument
(argument)[source]¶ BatchFileArgument read the arguments from a file and return the list of arguments. It is required for the execution of a batch snippet or batch pipeline.
-
class
pype.utils.arguments.
BatchListArgument
(argument)[source]¶ BatchArgument read the arguments from a file and return the list of arguments. It is required for the execution of a batch snippet or batch pipeline.
-
class
pype.utils.arguments.
CompositeArgument
(argument)[source]¶ A CompositeArgument retrieve the results from the results method of the specified snippet. It will not appear listed in the arguments help message so it’s value is None. In itself it contains a
PipelineItemArguments
object, defining the argument to pass to the results method of the snippets
-
class
pype.utils.arguments.
PipelineItemArguments
[source]¶ An object to gather the
Argument
of aPipelineItem
.This is meant to collect the structure and the type of the arguments defined in a pipeline yaml file.
-
add_argument
(argument, argument_type='argv_arg')[source]¶ Add the appropriate
Argument
class to thePipelineItemArguments
argument listParameters: - argument (dict) – An item from the list of arguments from the pipeline
yaml file. It should contain the keys prefix an pipeline_arg.
The key prefix indicate the flag usd in the snippet/pipeline to
which the
PipelineItem
is configured to execute. The key pipeline_arg indicate the keyword or object that the pipeline engine need to interpret to convert into arguments and also to construct the command line interface and. - argument_type (str) – The type of argument, this parameter will select which argument class would be used to parse the argument. possible choices are composite_arg, batch_list_arg and argv_arg. Default argv_arg.
- argument (dict) – An item from the list of arguments from the pipeline
yaml file. It should contain the keys prefix an pipeline_arg.
The key prefix indicate the flag usd in the snippet/pipeline to
which the
-
to_dict
(args_dict=None)[source]¶ Converts the argument in the
PipelineItemArguments
into dictionaries simlar to argparseExample
-
Queues¶
-
class
pype.utils.queues.
SnippetRuntime
(command, log, profile)[source]¶ A class to help building queue modules implementation for bio_pype.
An helper class that generalize various tasks to build queues modules and in the meantime creates a yaml file that records running jobs and job dependencies, agnostic of the underlying queueing system used.
Parameters: A Usage example of this class is the following implementation of the pbs (torque) queue system:
import os import datetime from pype.utils.queues import SnippetRuntime def submit(command, snippet_name, requirements, dependencies, log, profile): runtime = SnippetRuntime(command, log, profile) runtime.get_runtime(requirements, dependencies) queue_dependencies = runtime.queue_depends() stdout = os.path.join(log.__path__, 'stdout') stderr = os.path.join(log.__path__, 'stderr') stdout_pbs = os.path.join(log.__path__, 'stdout.pbs') stderr_pbs = os.path.join(log.__path__, 'stderr.pbs') now = datetime.datetime.now() now_plus_10 = now + datetime.timedelta(minutes=10) startime_str = now_plus_10.strftime("%H%M.%S") log.log.info('Execution qsub into working directory %s' % os.getcwd()) log.log.info('Redirect stdin/stderr to folder %s' % log.__path__) command = '''#!/bin/bash exec 1>%s exec 2>%s exec %s''' % (stdout, stderr, runtime.command) log.log.info('Retrive custom group environment variable') largs = [] if len(queue_dependencies) > 0: cmd_dependencies = [ 'afterok:%s' % dep for dep in queue_dependencies] depend = ['-W', 'depend=%s' % ','.join(cmd_dependencies)] largs += depend if 'time' in requirements.keys(): time = ['-l', 'walltime=%s' % requirements['time']] largs += time if 'mem' in requirements.keys(): mem = ['-l', 'mem=%s' % requirements['mem']] largs += mem if 'type' in requirements.keys(): if requirements['type'] == 'exclusive': exclusive = ['-l', 'naccesspolicy=singlejob'] largs += exclusive if 'ncpu' in requirements.keys(): try: nodes = int(requirements['nodes']) except KeyError: nodes = 1 cpus = ['-l', 'nodes=%i:ppn=%i' % (nodes, int(requirements['ncpu']))] largs += cpus qsub_group = os.environ.get('PYPE_QUEUE_GROUP') if qsub_group: log.log.info('Custom qsub group set to %s' % qsub_group) largs += ['-W', 'group_list=%s' % qsub_group, '-A', qsub_group] else: log.log.info('Custom qsub group not set') echo = 'echo \'%s\'' % command qsub = [ 'qsub', '-V', '-o', stdout_pbs, '-e', stderr_pbs, '-d', os.getcwd(), '-a', startime_str, '-N', snippet_name] + largs runtime.add_queue_commands( [echo, ' '.join(qsub)]) runtime.submit_queue(5) runtime.commit_runtime() return(runtime.run_id) def post_run(log): log.log.info('Done')
-
add_queue_commands
(commands)[source]¶ Add the list of commands to launch the job in the queue system.
The commands will be run in a pipe, so the output of the first item in the command list will be stdin of the second item, and so on.
Parameters: commands (list) – List of string with the commands
-
add_queue_id
(queue_id)[source]¶ Add a job ID for the snippet.
This is useful when the queue command is not submitted using
SnippetRuntime.submit_queue()
, so the job id is not automatically registered in the runtime object.Parameters: queue_id (str) – Job id string
-
change_sleep
(sleep_sec)[source]¶ Change the number of seconds to wait after submitting a job in the queue system.
It is used in
SnippetRuntime.submit_queue()
. It alters the attributeSnippetRuntime.sleep
Parameters: sleep_sec (int) – Number of seconds
-
commit_runtime
()[source]¶ Save the runtime dictionary in the pipeline_runtime.yaml file
The path of pipeline_runtime.yaml is the parent directory of the snippet log.
-
get_runtime
(requirements, dependencies)[source]¶ Load the runtime object, if does not exists initiate a new runtime dictionary.
Parameters:
-
queue_depends
()[source]¶ Returns the list of queue ids to which this command depends
The list in the runtime dictionary, in the key dependencies consinst on unique ids of the runtime object, this methods simply converts the runtime ids into queue ids.
Returns: Queue id dependency list Return type: list
-
submit_queue
(retry=1)[source]¶ Execute the queue commands, and add the resulting job id in the runtime dictionary.
The method accepts a number of retry attempts, which will enable to reiterate the specified number of time in case of failure, before failing the pipeline
Parameters: retry (int, optional) – Number of attempts before failing, defaults to 1
-
Snippets and Profiles¶
Process¶
-
class
pype.process.
Command
(cmd, log, profile, name='')[source]¶ High level class to use
subprocess.Popen
combined withVolume
andNamespace
classes.The
Command
class is a wrapper aroundsubprocess.Popen
that results in a more succinct code, increasing the readability of the command lines that are going to be executed rather then thesubprocess.Popen
boilerplate.The class initialization requires the command line string, a
Profile
class and a log object (eg the snippet log object).Parameters: -
add_input
(in_file, match='exact')[source]¶ The match argument can be either exact or recursive. - exact will match only the specified file - recursive will match all the file with the same prefix
of the specified file[summary]
[extended_summary]
Parameters:
-
add_output
(out_file)[source]¶ [summary]
[extended_summary]
Parameters: out_file ([type]) – [description]
-
-
class
pype.process.
Namespace
(program_dict, log, profile)[source]¶ A mechanism to load different environments
Define a basic abstraction layer to load programs and environments to the
Command
class[summary]
[extended_summary]
Parameters: - program_dict (dict) – A dictionary with the following keys namespace, version, dependencies. namespace is a string composed by the the namespace type and the namespace item, separated by the @ character. The supported namespace types are docker, env_modules and path. the namespace item is a string relevant to the namespace type (eg. the docker container repository url). the version is a string defining the tag/version of the docker container or the version of the program to load (again, depending on the namespace type selected). dependencies is a key only used for the env_modules namespace and is used to load other environment modules to satisfy the loading dependencies.
- log (
pype.logger.PypeLogger
) – Log object to append logging in the snippet log file - profile (
pype.utils.profiles.Profile
) – Profile object
Raises: - SnippetNamespaceError – Wrong Namespace format if the namespace does have more then @ characters.
- SnippetNamespaceError – Not supported namespace if the namespace type is not docker, env_modules or path.
- SnippetNamespaceError – All dependencies must be type env_module if some of the dependencies defined in the dependencies key is not a namespace of the env_modules type.
-
class
pype.process.
Volume
(path, output=False, bind_prefix='/var/lib/pype')[source]¶ Volume class to abstract and parametrize the binding of files while running commands in containerized environments.
The class contains also method to adjust the bind volume argument to implementation such as udocker and singularity.
Init the class defining the path in the host environment, the prefix in the container environment and flagging if the path is a input or an output target
Parameters: -
replace_bind_dirname
(bind_path)[source]¶ Replaces the bind volume in the container environment with the dirname of the specified bind path.
This is useful to give the same binding point to multiple paths (defined in multiple Volume classes) that are in the same folder in the host system.
Parameters: bind_path (str) – Binding point to replace instead of the current one randomly generated by the class.
-
replace_bind_volume
(bind_path)[source]¶ Replaces the bind volume in the container environment with the specified bind path.
This is useful to manage binding point to multiple paths (defined in multiple Volume classes) that are subfolders of another bind volume in the host system.
Parameters: bind_path (str) – Binding point to replace instead of the current one randomly generated by the class.
-
Misc¶
-
class
pype.misc.
DefaultHelpParser
(prog=None, usage=None, description=None, epilog=None, parents=[], formatter_class=<class 'argparse.HelpFormatter'>, prefix_chars='-', fromfile_prefix_chars=None, argument_default=None, conflict_handler='error', add_help=True, allow_abbrev=True)[source]¶
-
class
pype.misc.
SubcommandHelpFormatter
(prog, indent_increment=2, max_help_position=24, width=None)[source]¶
-
pype.misc.
xopen
(filename, mode='r')[source]¶ Wrap around open/gzip.open and stdin/out.
Replacement for the “open” function that can also open files that have been compressed with gzip. If the filename ends with .gz, the file is opened with gzip.open(). If it doesn’t, the regular open() is used. If the filename is ‘-‘, standard output (mode ‘w’) or input (mode ‘r’) is returned.