Snippets#

A snippet is the basic execution unit of Bio_pype. Snippets define reusable computational tasks and can be written in two formats:

Markdown format (recommended): Structured markdown file with embedded code chunks
Python module format (advanced): Python file with specific required functions

Both formats produce the same functionality but offer different levels of control and portability.

Markdown Snippets (Recommended)#

Section Reference#

Markdown snippets use ## headers to define sections:

Required sections:

## description - Brief explanation of the snippet’s purpose
## requirements - Resource requirements (YAML with ncpu, time, mem)
## results - Output file definitions (code chunk returning YAML/JSON dict)
## arguments - Command-line arguments (numbered list format)
## snippet - Execution code chunks

Optional sections:

## name - Custom friendly name for job tracking

Complete Example#

# Example Test Snippet

## description

Converts text files to uppercase, then to lowercase

## requirements

​```yaml
ncpu: 1
time: '00:01:00'
mem: 1gb
​```

## results

​```bash
@/bin/sh, yaml

printf 'file_out: %(output)s'
​```

## arguments

1. input/i
    - help: input(s) text file
    - type: str
    - required: true
    - nargs: *

2. output/o
    - help: output file
    - type: str
    - default: output.txt

## snippet

> _input_: input profile_dummy_file*

​```bash
@/bin/sh, chk1, stdout=chk2, namespace=alpine_3

files_input='%(input)s'
dummy_file='%(profile_dummy_file)s'

cat $files_input $dummy_file | awk '{ print toupper($0) }'
​```

> _output_: results_file_out

​```bash
@/bin/sh, chk2, namespace=alpine_3

awk '{ print tolower($0) }' > '%(output)s'
​```

Section Breakdown#

1. Title (Required)#

# Snippet Title

The snippet name is determined by the filename (without .md extension), not the title. The title is for documentation only.

2. Description (Required)#

## description

Brief explanation of the snippet's purpose and functionality

3. Requirements (Required)#

Specifies computational resources for job schedulers. All three fields are required.

## requirements

​```yaml
ncpu: 4          # Number of CPU cores (required)
time: '02:00:00' # Max runtime HH:MM:SS (required)
mem: 8gb         # Memory allocation (required)
​```

Required fields: ncpu, time, mem

These values can be referenced in code chunks using %(requirements_ncpu)s, %(requirements_time)s, %(requirements_mem)s.

4. Results (Required)#

Defines output files as a dictionary. The code chunk must execute and print key-value pairs that map result names to file paths.

## results

​```bash
@/bin/sh, yaml

printf 'output_bam: %(output_dir)s/alignment.bam\n'
printf 'output_index: %(output_dir)s/alignment.bam.bai'
​```

Header format: @interpreter, parser_format

interpreter: Command to execute the chunk (e.g., /bin/sh, python)
parser_format: Must be yaml or json

Key points:

The chunk must print valid YAML or JSON dictionary output
Use %(variable)s syntax to reference arguments
Output keys become available as %(results_keyname)s in snippet chunks

5. Arguments (Required)#

Defines command-line interface using numbered list format.

## arguments

1. input/i
    - help: Input file description
    - type: str
    - required: true
    - nargs: *

2. output/o
    - help: Output file path
    - type: str
    - default: output.txt

3. threads/t
    - help: Number of threads
    - type: int
    - default: 4

4. verbose/v
    - help: Enable verbose output
    - action: store_true

Argument format: argument_name/short_flag (e.g., input/i creates --input and -i)

Valid argument options:

Option	Description
`help`	Description text for the argument
`type`	Data type: `str`, `int`, or `float` (use `action` for booleans)
`required`	`true` or `false` - whether argument is mandatory
`default`	Default value if argument not provided
`nargs`	Number of values: `*` (zero or more), `+` (one or more), `?` (zero or one), or integer
`action`	Special action: `store_true` or `store_false`
`choices`	Comma or space separated list of valid values

6. Name (Optional)#

Override the default snippet name with a custom friendly name.

## name

​```python
@python

print('analysis_%(sample_id)s_%(timestamp)s')
​```

7. Snippet (Required)#

Contains the execution code, organized as code chunks with optional input/output declarations.

## snippet

> _input_: input_arg1 profile_config_file

​```bash
@/bin/sh, chunk1, stdout=chunk2, namespace=docker_image

# Your code here
# Variables available: %(input_arg1)s, %(profile_config_file)s
​```

> _output_: results_output_file

​```bash
@/bin/sh, chunk2

# Process and write to %(results_output_file)s
​```

Code Chunk Syntax#

Code chunks use the following header format:

@interpreter, chunk_name, [options]

Components: - @interpreter: Execution environment (e.g., /bin/sh, python, Rscript) - chunk_name: Unique identifier for the chunk - stdout=next_chunk: Pipe output to another chunk - stderr=file: Redirect stderr - namespace=env: Execution namespace (see Namespaces section)

Variable Substitution#

Variables are substituted using Python string formatting: %(variable_name)s

Variable sources:

Arguments: Use the long argument name directly
- --input → %(input)s
- Note: Only the long name works (e.g., %(input)s not %(i)s)
Profile files: Prefixed with profile_
- Profile key genome_fa → %(profile_genome_fa)s
Results: Prefixed with results_
- Results key output_bam → %(results_output_bam)s
Requirements: Prefixed with requirements_
- %(requirements_ncpu)s, %(requirements_time)s, %(requirements_mem)s

Input/Output Declarations#

Use blockquotes to declare dependencies for each code chunk:

> _input_: input_file profile_genome_fa*

​```bash
# Code chunk
​```

> _output_: results_aligned_bam

Input declaration (``_input_``):

Specifies which variables the chunk reads. This tells Docker/Singularity which files and directories need to be mounted into the container as read-only (ro).

Variable names must match defined arguments or profile/results variables
Supports wildcard suffixes to control which related files are bound
All input files are mounted read-only for safety

Output declaration (``_output_``):

Specifies which results keys this chunk produces. Docker/Singularity mounts the parent directory of each output file as read-write (rw).

Lists which results keys this chunk produces
Parent directory is automatically bound (no wildcard pattern needed)
Output files must be written to the mounted directory

Wildcard Suffixes (Input Only):

Wildcards are only used in _input_ declarations to control how Docker/Singularity binds files into containers. They instruct the system which related files should be included alongside the specified path.

Wildcard	Meaning	Use Case
`*`	Recursive all matches	Bind all files with matching prefix (e.g., if the value of the variable is `genome.fa` the bind will be applied to `genome.fa.*`)
`~`	Directory containing file	Bind the entire directory (useful for complex data structures)
`..`	Related file extensions	Bind all files with same basename but different extensions (e.g., if the value of the variable is `genome.fa` the bind will be applied to `genome.*`)
none	Exact match only	Bind only the specified file

Examples:

> _input_: genome_file* config_dir~ bam_file..
> _output_: results_output_bam results_output_log

Input mounting (read-only):

Given these argument values:

--genome_file=/data/genome.fa
--config_dir=/etc/config/settings.conf
--bam_file=/results/alignment.bam

The system binds:

genome_file*: /data/genome.fa, /data/genome.fa.fai, /data/genome.fa.gz, etc. (all matching files)
config_dir~: Entire /etc/config/ directory
bam_file..: /results/alignment.bam, /results/alignment.bam.bai, /results/alignment.bam.md5, etc.
Exact match (no suffix): Only that specific file

All input mounts are read-only.

Output mounting (read-write):

Given these results definitions:

output_bam: /work/results/aligned.bam
output_log: /work/results/aligned.log

The system binds:

Parent directory /work/results/ as read-write
Both output files are written to this mounted directory
No wildcard patterns needed for outputs

Namespaces#

Namespaces define the execution environment for code chunks. They are configured in profile files and referenced in snippet chunk headers using namespace=program_name.

​```bash
@/bin/sh, chunk1, namespace=samtools

samtools view -h alignment.bam
​```

The namespace=samtools references a program defined in the active profile. Bio_pype supports three namespace types:

path: Uses programs from system PATH
env_module@name: Loads Environment Modules before execution
docker@image: Runs inside a container (Docker/Singularity/uDocker)

See Profiles for detailed namespace configuration.

Python Snippets (Advanced)#

Python snippets provide more control and are useful for complex logic or when direct Python execution is needed.

File Structure#

Python snippets must be in a proper Python module:

my_snippets/
├── __init__.py          # Required for module
├── align_reads.py       # Snippet file
└── process_variants.py  # Another snippet

The snippet name is the filename without ``.py`` extension.

Required Functions#

Every Python snippet must implement these four functions:

1. `requirements()`#

Returns resource requirements dictionary.

def requirements():
    return {
        'ncpu': 4,
        'time': '02:00:00',
        'mem': '8gb'
    }

2. `results(argv)`#

Returns dictionary of output files. Receives parsed arguments.

def results(argv):
    """Define output files based on arguments"""
    try:
        output_file = argv['--output']
    except KeyError:
        output_file = argv['-o']

    return {
        'output_fasta': output_file,
        'output_log': output_file + '.log'
    }

Note: Access arguments using both long and short forms for robustness.

3. `add_parser(subparsers, module_name)`#

Creates argument parser (without adding arguments).

def add_parser(subparsers, module_name):
    """Create the argument parser"""
    return subparsers.add_parser(
        module_name,
        help='Brief description of snippet',
        add_help=False
    )

4. `<snippet_name>(subparsers, module_name, argv, profile, log)`#

Main execution function. Function name must match the filename (without .py).

def reverse_fa(subparsers, module_name, argv, profile, log):
    """Main execution function"""
    # Parse arguments
    parser = add_parser(subparsers, module_name)
    parser.add_argument('-i', '--input', required=True,
                       help='Input fasta file')
    parser.add_argument('-o', '--output', required=True,
                       help='Output fasta file')
    args = parser.parse_args(argv)

    # Your implementation here
    with open(args.input, 'rt') as infile, \
         open(args.output, 'wt') as outfile:
        # Process data
        pass

Parameters: - subparsers: argparse subparsers object - module_name: Name of the snippet - argv: Command-line arguments list - profile: Profile configuration dictionary - log: Logger object

Optional: `friendly_name(argv)`#

Override default snippet name for logs and job IDs.

def friendly_name(argv):
    """Generate custom name for this execution"""
    try:
        input_file = argv['--input']
    except KeyError:
        input_file = argv['-i']

    # Clean up filename
    base_name = os.path.basename(input_file)
    base_name = base_name.replace('.gz', '').replace('.txt', '')

    return f'reverse_fa_{base_name}'

Complete Python Example#

import os


def requirements():
    """Define computational resources"""
    return {
        'ncpu': 1,
        'time': '00:01:00',
        'mem': '1gb'
    }


def results(argv):
    """Define output files"""
    try:
        file = argv['--output']
    except KeyError:
        file = argv['-o']
    return {'out': file}


def friendly_name(argv):
    """Generate friendly name for job tracking"""
    try:
        input_file = argv['--input']
    except KeyError:
        input_file = argv['-i']

    input_file = input_file.replace('.gz', '').replace('.txt', '')
    return f'reverse_fa_{os.path.basename(input_file)}'


def add_parser(subparsers, module_name):
    """Create argument parser"""
    return subparsers.add_parser(
        module_name,
        help='Reverse a fasta sequence',
        add_help=False
    )


def reverse_fa(subparsers, module_name, argv, profile, log):
    """Main execution: reverse FASTA sequences"""
    # Setup parser
    parser = add_parser(subparsers, module_name)
    parser.add_argument('-i', '--input', dest='input',
                       help='Input fasta file', required=True)
    parser.add_argument('-o', '--output', dest='output',
                       help='Output fasta file', required=True)
    args = parser.parse_args(argv)

    # Process FASTA file
    with open(args.input, 'rt') as input_file, \
         open(args.output, 'wt') as output:

        fasta_parser = parse_fasta(input_file)
        for header, sequence in fasta_parser:
            output.write(f'>{header} reverse\n')

            # Write reversed sequence in 60-char lines
            rev_seq = sequence[::-1]
            for i in range(0, len(rev_seq), 60):
                output.write(rev_seq[i:i+60] + '\n')


def parse_fasta(file):
    """Parse FASTA format file"""
    header, sequence = '', ''
    for line in file:
        if line.startswith('>'):
            if sequence:
                yield (header, sequence)
            header = line[1:].strip()
            sequence = ''
        else:
            sequence += line.strip()
    if sequence:
        yield (header, sequence)

Choosing Between Markdown and Python#

Use Markdown when:#

Wrapping existing command-line tools
Running bash/shell scripts
Need portability across different execution environments
Want simpler, more declarative syntax
Working with bioinformatics pipelines

Use Python when:#

Need complex control flow or logic
Require direct Python library access
Have intricate data processing needs
Want better IDE support and debugging
Building reusable helper functions

Best Practices#

Use descriptive names: Snippet filenames should clearly indicate their purpose
Document thoroughly: Include helpful descriptions and argument help text
Handle errors gracefully: Validate inputs and provide informative error messages
Make snippets modular: Each snippet should do one thing well
Use namespaces: Make snippets portable by leveraging namespace configuration
Test with different arguments: Ensure default values work and required arguments are validated
Version control profiles: Keep execution environments reproducible via profiles

Common Patterns#

Chaining chunks with pipes#

```bash
@/bin/sh, step1, stdout=step2

cat input.txt | awk '{print $1}'
```

```bash
@/bin/sh, step2, stdout=step3

sort -u
```

```bash
@/bin/sh, step3

grep "pattern" > output.txt
```

Using multiple inputs#

## arguments

1. forward_reads/1
    - help: Forward reads
    - type: str
    - required: true

2. reverse_reads/2
    - help: Reverse reads
    - type: str
    - required: true

## snippet

> _input_: forward_reads reverse_reads

```bash
@/bin/sh, align

bwa mem reference.fa %(forward_reads)s %(reverse_reads)s > aligned.sam
```

Accessing profile variables#

> _input_: profile_reference_genome profile_dbsnp*

```bash
@/bin/sh, variant_call

gatk HaplotypeCaller \
  -R %(profile_reference_genome)s \
  --dbsnp %(profile_dbsnp)s \
  -I input.bam -O output.vcf
```

Additional Resources#

Profile configuration: See Bio_pype Profiles documentation
Variable substitution: Python string formatting
Environment Modules: Environment Modules Project

Quick Reference#

Markdown Sections#

Required: description, requirements, results, arguments, snippet

Optional: name

Requirements Fields#

Required: ncpu, time, mem

Argument Options#

help, type, required, default, nargs, action, choices

Valid types: str, int, float

Valid actions: store_true, store_false

Variable Prefixes#

Arguments: %(arg_name)s (long name only, e.g., %(input)s)
Profile files: %(profile_<key>)s (e.g., %(profile_genome_fa)s)
Results: %(results_<key>)s (e.g., %(results_output_bam)s)
Requirements: %(requirements_<key>)s (e.g., %(requirements_ncpu)s)

Results Chunk Header#

@interpreter, parser_format where parser_format is yaml or json

Snippet Chunk Header#

@interpreter, chunk_name [, namespace=program] [, stdout=next_chunk]

Python Required Functions#

requirements() - Return resource dict with ncpu, time, mem
results(argv) - Return output files dict
add_parser(subparsers, module_name) - Create parser
<snippet_name>(...) - Main execution function
friendly_name(argv) - Optional custom name