Snippets¶
A snippet is the basic execution unit of Bio_pype. It can be written either as a Markdown file using code chunks to run arbitrary code, or as a Python module (see Advanced Snippets in Python)
Basic Snippet Structure¶
A complete snippet example is shown in the simple_snippet section.
A snippet consists of the following section headers:
requirements: Contains a code chunk returning a dictionary that specifies the necessary resources to run the snippet (e.g., used to allocate resources in queuing systems)
results: Contains a code chunk returning a dictionary listing all files produced by the snippet’s execution
arguments: A numbered list interpreted by
argparse
to produce the snippet’s command line interfacesnippet: Contains the code chunks with instructions for performing the desired task
name: An optional section containing a chunk that returns a “friendly name”. This name overrides the default snippet name and helps identify log folders and job IDs more easily.
The input and output arguments are passed to the various chunk via variable substitutions by name, a method used in python strings formatting.
In practice it means that a string %(hello)s present in a chunk, would be replace by the value of the variable hello
There are few ways of setting variables:
The arguments section
The profiles.files (See Profiles)
The keys from the results object
The arguments variables are named after the argument name, and the value is the value passed to the the command line.
The variables from the profile and from the results section are prefixed with profile_ and results_ respectively. This means that in order to pass a key, eg. genome_fa, present in the profile.file, in the snippet chunk it corresponds to %(profile_genome_fa)s.
More detail on the argument passing in the following section
Reference arguments results and files¶
Using Namespaces¶
<<This section may go to Profiles>> The namespace are set in the profile file. Ideally the snippet should be agnostic on the final runtime execution, and it may be possible to run it as-is in different environment by only change the namespace in the profile.
More broadly the namespace is a mechanism to set the environment to where execute the chunk.
Supported namespace are:
Path: assumes that the commands in the chunks are present in the environment $PATH
Environment Modules: loads a set of specified modules before running the chunk
Docker: run the chunk within a container image. This namespace supports also uDocker and singularity
Path¶
Environment Modules¶
Docker/Singularity/uDocker¶
Advanced Snippets in Python¶
The snippets are located in a python module (mind the __init__.py in the folder containing the snippets). In order to function, each snippet need to have 4 specific function:
requirements: a function returning a dictionary with the necessary resource to run the snippet (used to allocate resource in queuing systems)
results: a function accepting a dictionary with the snippet arguments and returning a dictionary listing all the files produced by the execution of the snippet
add_parser: a function that implement the
argparse
module and defines the command line arguments accepted by the snippeta function named as the snippet file name (without the .py extension), containing the code for the execution of the tool
from pype.process import Command
def requirements():
return({
'ncpu': 1,
'time': '00:01:00',
'mem': '1gb'})
def results(argv):
output = None
try:
output = argv['-o']
except KeyError:
try:
output = argv['--output']
except KeyError as e:
raise e
return({'file_out': output})
def add_parser(subparsers, module_name):
return subparsers.add_parser(
module_name, help='Test snippet example -in python-',
add_help=False)
def test_adv_args(parser, argv):
parser.add_argument(
'-i', '--input', dest='input', nargs='*',
help='input(s) text file', type=str, required=True)
parser.add_argument(
'-o', '--output', dest='output', type=str,
default='output.txt', help='output file')
return parser.parse_args(argv)
def test_adv(subparsers, module_name, argv, profile, log):
args = test_adv_args(
add_parser(subparsers, module_name), argv)
dummy_file = profile.files['dummy_file']
cmd1 = 'cat %s %s' % (
' '.join(args.input), dummy_file)
cmd2 = 'awk \'{ print toupper($0) }\''
cmd3 = 'awk \'{ print tolower($0) }\''
cat = Command(
cmd1, log, profile, 'cat')
to_up = Command(
cmd2, log, profile, 'to_up')
to_low = Command(
cmd3, log, profile, 'too_low')
for input_file in args.input:
cat.add_input(input_file)
cat.add_input(dummy_file)
to_low.add_output(args.output)
cat.add_namespace(profile.programs['alpine_3'])
to_up.add_namespace(profile.programs['alpine_3'])
to_low.add_namespace(profile.programs['alpine_3'])
to_up.pipe_in(cat)
to_low.pipe_in(to_up)
with open(args.output, 'wt') as output:
to_low.run()
for line in to_low.stdout:
output.write(line.decode('utf-8'))
to_low.close()