Snippets

A snippet is the basic execution unit of Bio_pype. It can be written either as a Markdown file using code chunks to run arbitrary code, or as a Python module (see Advanced Snippets in Python)

Basic Snippet Structure

A complete snippet example is shown in the simple_snippet section.

A snippet consists of the following section headers:

  1. requirements: Contains a code chunk returning a dictionary that specifies the necessary resources to run the snippet (e.g., used to allocate resources in queuing systems)

  2. results: Contains a code chunk returning a dictionary listing all files produced by the snippet’s execution

  3. arguments: A numbered list interpreted by argparse to produce the snippet’s command line interface

  4. snippet: Contains the code chunks with instructions for performing the desired task

  5. name: An optional section containing a chunk that returns a “friendly name”. This name overrides the default snippet name and helps identify log folders and job IDs more easily.

The input and output arguments are passed to the various chunk via variable substitutions by name, a method used in python strings formatting.

In practice it means that a string %(hello)s present in a chunk, would be replace by the value of the variable hello

There are few ways of setting variables:

  1. The arguments section

  2. The profiles.files (See Profiles)

  3. The keys from the results object

The arguments variables are named after the argument name, and the value is the value passed to the the command line.

The variables from the profile and from the results section are prefixed with profile_ and results_ respectively. This means that in order to pass a key, eg. genome_fa, present in the profile.file, in the snippet chunk it corresponds to %(profile_genome_fa)s.

More detail on the argument passing in the following section

Reference arguments results and files

Using Namespaces

<<This section may go to Profiles>> The namespace are set in the profile file. Ideally the snippet should be agnostic on the final runtime execution, and it may be possible to run it as-is in different environment by only change the namespace in the profile.

More broadly the namespace is a mechanism to set the environment to where execute the chunk.

Supported namespace are:

  1. Path: assumes that the commands in the chunks are present in the environment $PATH

  2. Environment Modules: loads a set of specified modules before running the chunk

  3. Docker: run the chunk within a container image. This namespace supports also uDocker and singularity

Path

Environment Modules

Docker/Singularity/uDocker

Advanced Snippets in Python

The snippets are located in a python module (mind the __init__.py in the folder containing the snippets). In order to function, each snippet need to have 4 specific function:

  1. requirements: a function returning a dictionary with the necessary resource to run the snippet (used to allocate resource in queuing systems)

  2. results: a function accepting a dictionary with the snippet arguments and returning a dictionary listing all the files produced by the execution of the snippet

  3. add_parser: a function that implement the argparse module and defines the command line arguments accepted by the snippet

  4. a function named as the snippet file name (without the .py extension), containing the code for the execution of the tool

from pype.process import Command


def requirements():
    return({
        'ncpu': 1,
        'time': '00:01:00',
        'mem': '1gb'})


def results(argv):
    output = None
    try:
        output = argv['-o']
    except KeyError:
        try:
            output = argv['--output']
        except KeyError as e:
            raise e
    return({'file_out': output})


def add_parser(subparsers, module_name):
    return subparsers.add_parser(
        module_name, help='Test snippet example -in python-',
        add_help=False)


def test_adv_args(parser, argv):
    parser.add_argument(
        '-i', '--input', dest='input', nargs='*',
        help='input(s) text file', type=str, required=True)
    parser.add_argument(
        '-o', '--output', dest='output', type=str,
        default='output.txt', help='output file')
    return parser.parse_args(argv)


def test_adv(subparsers, module_name, argv, profile, log):
    args = test_adv_args(
        add_parser(subparsers, module_name), argv)

    dummy_file = profile.files['dummy_file']

    cmd1 = 'cat %s %s' % (
        ' '.join(args.input), dummy_file)
    cmd2 = 'awk \'{ print toupper($0) }\''
    cmd3 = 'awk \'{ print tolower($0) }\''

    cat = Command(
        cmd1, log, profile, 'cat')
    to_up = Command(
        cmd2, log, profile, 'to_up')
    to_low = Command(
        cmd3, log, profile, 'too_low')
    for input_file in args.input:
        cat.add_input(input_file)
    cat.add_input(dummy_file)
    to_low.add_output(args.output)
    cat.add_namespace(profile.programs['alpine_3'])
    to_up.add_namespace(profile.programs['alpine_3'])
    to_low.add_namespace(profile.programs['alpine_3'])
    to_up.pipe_in(cat)
    to_low.pipe_in(to_up)

    with open(args.output, 'wt') as output:
        to_low.run()
        for line in to_low.stdout:
            output.write(line.decode('utf-8'))
        to_low.close()