Snippets#

A snippet is the basic execution unit of Bio_pype. Snippets define reusable computational tasks and can be written in two formats:

  1. Markdown format (recommended): Structured markdown file with embedded code chunks

  2. Python module format (advanced): Python file with specific required functions

Both formats produce the same functionality but offer different levels of control and portability.



Namespaces#

Namespaces define the execution environment for code chunks. They are configured in profile files and referenced in snippet chunk headers using namespace=program_name.

​```bash
@/bin/sh, chunk1, namespace=samtools

samtools view -h alignment.bam
​```

The namespace=samtools references a program defined in the active profile. Bio_pype supports three namespace types:

  • path: Uses programs from system PATH

  • env_module@name: Loads Environment Modules before execution

  • docker@image: Runs inside a container (Docker/Singularity/uDocker)

See Profiles for detailed namespace configuration.


Python Snippets (Advanced)#

Python snippets provide more control and are useful for complex logic or when direct Python execution is needed.

File Structure#

Python snippets must be in a proper Python module:

my_snippets/
├── __init__.py          # Required for module
├── align_reads.py       # Snippet file
└── process_variants.py  # Another snippet

The snippet name is the filename without ``.py`` extension.

Required Functions#

Every Python snippet must implement these four functions:

1. requirements()#

Returns resource requirements dictionary.

def requirements():
    return {
        'ncpu': 4,
        'time': '02:00:00',
        'mem': '8gb'
    }

2. results(argv)#

Returns dictionary of output files. Receives parsed arguments.

def results(argv):
    """Define output files based on arguments"""
    try:
        output_file = argv['--output']
    except KeyError:
        output_file = argv['-o']

    return {
        'output_fasta': output_file,
        'output_log': output_file + '.log'
    }

Note: Access arguments using both long and short forms for robustness.

3. add_parser(subparsers, module_name)#

Creates argument parser (without adding arguments).

def add_parser(subparsers, module_name):
    """Create the argument parser"""
    return subparsers.add_parser(
        module_name,
        help='Brief description of snippet',
        add_help=False
    )

4. <snippet_name>(subparsers, module_name, argv, profile, log)#

Main execution function. Function name must match the filename (without .py).

def reverse_fa(subparsers, module_name, argv, profile, log):
    """Main execution function"""
    # Parse arguments
    parser = add_parser(subparsers, module_name)
    parser.add_argument('-i', '--input', required=True,
                       help='Input fasta file')
    parser.add_argument('-o', '--output', required=True,
                       help='Output fasta file')
    args = parser.parse_args(argv)

    # Your implementation here
    with open(args.input, 'rt') as infile, \
         open(args.output, 'wt') as outfile:
        # Process data
        pass

Parameters: - subparsers: argparse subparsers object - module_name: Name of the snippet - argv: Command-line arguments list - profile: Profile configuration dictionary - log: Logger object

Optional: friendly_name(argv)#

Override default snippet name for logs and job IDs.

def friendly_name(argv):
    """Generate custom name for this execution"""
    try:
        input_file = argv['--input']
    except KeyError:
        input_file = argv['-i']

    # Clean up filename
    base_name = os.path.basename(input_file)
    base_name = base_name.replace('.gz', '').replace('.txt', '')

    return f'reverse_fa_{base_name}'

Complete Python Example#

import os


def requirements():
    """Define computational resources"""
    return {
        'ncpu': 1,
        'time': '00:01:00',
        'mem': '1gb'
    }


def results(argv):
    """Define output files"""
    try:
        file = argv['--output']
    except KeyError:
        file = argv['-o']
    return {'out': file}


def friendly_name(argv):
    """Generate friendly name for job tracking"""
    try:
        input_file = argv['--input']
    except KeyError:
        input_file = argv['-i']

    input_file = input_file.replace('.gz', '').replace('.txt', '')
    return f'reverse_fa_{os.path.basename(input_file)}'


def add_parser(subparsers, module_name):
    """Create argument parser"""
    return subparsers.add_parser(
        module_name,
        help='Reverse a fasta sequence',
        add_help=False
    )


def reverse_fa(subparsers, module_name, argv, profile, log):
    """Main execution: reverse FASTA sequences"""
    # Setup parser
    parser = add_parser(subparsers, module_name)
    parser.add_argument('-i', '--input', dest='input',
                       help='Input fasta file', required=True)
    parser.add_argument('-o', '--output', dest='output',
                       help='Output fasta file', required=True)
    args = parser.parse_args(argv)

    # Process FASTA file
    with open(args.input, 'rt') as input_file, \
         open(args.output, 'wt') as output:

        fasta_parser = parse_fasta(input_file)
        for header, sequence in fasta_parser:
            output.write(f'>{header} reverse\n')

            # Write reversed sequence in 60-char lines
            rev_seq = sequence[::-1]
            for i in range(0, len(rev_seq), 60):
                output.write(rev_seq[i:i+60] + '\n')


def parse_fasta(file):
    """Parse FASTA format file"""
    header, sequence = '', ''
    for line in file:
        if line.startswith('>'):
            if sequence:
                yield (header, sequence)
            header = line[1:].strip()
            sequence = ''
        else:
            sequence += line.strip()
    if sequence:
        yield (header, sequence)

Choosing Between Markdown and Python#

Use Markdown when:#

  • Wrapping existing command-line tools

  • Running bash/shell scripts

  • Need portability across different execution environments

  • Want simpler, more declarative syntax

  • Working with bioinformatics pipelines

Use Python when:#

  • Need complex control flow or logic

  • Require direct Python library access

  • Have intricate data processing needs

  • Want better IDE support and debugging

  • Building reusable helper functions


Best Practices#

  1. Use descriptive names: Snippet filenames should clearly indicate their purpose

  2. Document thoroughly: Include helpful descriptions and argument help text

  3. Handle errors gracefully: Validate inputs and provide informative error messages

  4. Make snippets modular: Each snippet should do one thing well

  5. Use namespaces: Make snippets portable by leveraging namespace configuration

  6. Test with different arguments: Ensure default values work and required arguments are validated

  7. Version control profiles: Keep execution environments reproducible via profiles


Common Patterns#

Chaining chunks with pipes#

```bash
@/bin/sh, step1, stdout=step2

cat input.txt | awk '{print $1}'
```

```bash
@/bin/sh, step2, stdout=step3

sort -u
```

```bash
@/bin/sh, step3

grep "pattern" > output.txt
```

Using multiple inputs#

## arguments

1. forward_reads/1
    - help: Forward reads
    - type: str
    - required: true

2. reverse_reads/2
    - help: Reverse reads
    - type: str
    - required: true

## snippet

> _input_: forward_reads reverse_reads

```bash
@/bin/sh, align

bwa mem reference.fa %(forward_reads)s %(reverse_reads)s > aligned.sam
```

Accessing profile variables#

> _input_: profile_reference_genome profile_dbsnp*

```bash
@/bin/sh, variant_call

gatk HaplotypeCaller \
  -R %(profile_reference_genome)s \
  --dbsnp %(profile_dbsnp)s \
  -I input.bam -O output.vcf
```

Additional Resources#


Quick Reference#

Markdown Sections#

Required: description, requirements, results, arguments, snippet

Optional: name

Requirements Fields#

Required: ncpu, time, mem

Argument Options#

help, type, required, default, nargs, action, choices

Valid types: str, int, float

Valid actions: store_true, store_false

Variable Prefixes#

  • Arguments: %(arg_name)s (long name only, e.g., %(input)s)

  • Profile files: %(profile_<key>)s (e.g., %(profile_genome_fa)s)

  • Results: %(results_<key>)s (e.g., %(results_output_bam)s)

  • Requirements: %(requirements_<key>)s (e.g., %(requirements_ncpu)s)

Results Chunk Header#

@interpreter, parser_format where parser_format is yaml or json

Snippet Chunk Header#

@interpreter, chunk_name [, namespace=program] [, stdout=next_chunk]

Python Required Functions#

  • requirements() - Return resource dict with ncpu, time, mem

  • results(argv) - Return output files dict

  • add_parser(subparsers, module_name) - Create parser

  • <snippet_name>(...) - Main execution function

  • friendly_name(argv) - Optional custom name