Snippets#
A snippet is the basic execution unit of Bio_pype. Snippets define reusable computational tasks and can be written in two formats:
Markdown format (recommended): Structured markdown file with embedded code chunks
Python module format (advanced): Python file with specific required functions
Both formats produce the same functionality but offer different levels of control and portability.
Markdown Snippets (Recommended)#
Section Reference#
Markdown snippets use ## headers to define sections:
Required sections:
## description- Brief explanation of the snippet’s purpose## requirements- Resource requirements (YAML with ncpu, time, mem)## results- Output file definitions (code chunk returning YAML/JSON dict)## arguments- Command-line arguments (numbered list format)## snippet- Execution code chunks
Optional sections:
## name- Custom friendly name for job tracking
Complete Example#
# Example Test Snippet
## description
Converts text files to uppercase, then to lowercase
## requirements
```yaml
ncpu: 1
time: '00:01:00'
mem: 1gb
```
## results
```bash
@/bin/sh, yaml
printf 'file_out: %(output)s'
```
## arguments
1. input/i
- help: input(s) text file
- type: str
- required: true
- nargs: *
2. output/o
- help: output file
- type: str
- default: output.txt
## snippet
> _input_: input profile_dummy_file*
```bash
@/bin/sh, chk1, stdout=chk2, namespace=alpine_3
files_input='%(input)s'
dummy_file='%(profile_dummy_file)s'
cat $files_input $dummy_file | awk '{ print toupper($0) }'
```
> _output_: results_file_out
```bash
@/bin/sh, chk2, namespace=alpine_3
awk '{ print tolower($0) }' > '%(output)s'
```
Section Breakdown#
1. Title (Required)#
# Snippet Title
The snippet name is determined by the filename (without .md
extension), not the title. The title is for documentation only.
2. Description (Required)#
## description
Brief explanation of the snippet's purpose and functionality
3. Requirements (Required)#
Specifies computational resources for job schedulers. All three fields are required.
## requirements
```yaml
ncpu: 4 # Number of CPU cores (required)
time: '02:00:00' # Max runtime HH:MM:SS (required)
mem: 8gb # Memory allocation (required)
```
Required fields: ncpu, time, mem
These values can be referenced in code chunks using %(requirements_ncpu)s,
%(requirements_time)s, %(requirements_mem)s.
4. Results (Required)#
Defines output files as a dictionary. The code chunk must execute and print key-value pairs that map result names to file paths.
## results
```bash
@/bin/sh, yaml
printf 'output_bam: %(output_dir)s/alignment.bam\n'
printf 'output_index: %(output_dir)s/alignment.bam.bai'
```
Header format: @interpreter, parser_format
interpreter: Command to execute the chunk (e.g.,/bin/sh,python)parser_format: Must beyamlorjson
Key points:
The chunk must print valid YAML or JSON dictionary output
Use
%(variable)ssyntax to reference argumentsOutput keys become available as
%(results_keyname)sin snippet chunks
5. Arguments (Required)#
Defines command-line interface using numbered list format.
## arguments
1. input/i
- help: Input file description
- type: str
- required: true
- nargs: *
2. output/o
- help: Output file path
- type: str
- default: output.txt
3. threads/t
- help: Number of threads
- type: int
- default: 4
4. verbose/v
- help: Enable verbose output
- action: store_true
Argument format: argument_name/short_flag (e.g., input/i
creates --input and -i)
Valid argument options:
Option |
Description |
|---|---|
|
Description text for the argument |
|
Data type: |
|
|
|
Default value if argument not provided |
|
Number of values: |
|
Special action: |
|
Comma or space separated list of valid values |
6. Name (Optional)#
Override the default snippet name with a custom friendly name.
## name
```python
@python
print('analysis_%(sample_id)s_%(timestamp)s')
```
7. Snippet (Required)#
Contains the execution code, organized as code chunks with optional input/output declarations.
## snippet
> _input_: input_arg1 profile_config_file
```bash
@/bin/sh, chunk1, stdout=chunk2, namespace=docker_image
# Your code here
# Variables available: %(input_arg1)s, %(profile_config_file)s
```
> _output_: results_output_file
```bash
@/bin/sh, chunk2
# Process and write to %(results_output_file)s
```
Code Chunk Syntax#
Code chunks use the following header format:
@interpreter, chunk_name, [options]
Components: - @interpreter: Execution environment (e.g.,
/bin/sh, python, Rscript) - chunk_name: Unique
identifier for the chunk - stdout=next_chunk: Pipe output to another
chunk - stderr=file: Redirect stderr - namespace=env: Execution
namespace (see Namespaces section)
Variable Substitution#
Variables are substituted using Python string formatting:
%(variable_name)s
Variable sources:
Arguments: Use the long argument name directly
--input→%(input)sNote: Only the long name works (e.g.,
%(input)snot%(i)s)
Profile files: Prefixed with
profile_Profile key
genome_fa→%(profile_genome_fa)s
Results: Prefixed with
results_Results key
output_bam→%(results_output_bam)s
Requirements: Prefixed with
requirements_%(requirements_ncpu)s,%(requirements_time)s,%(requirements_mem)s
Input/Output Declarations#
Use blockquotes to declare dependencies for each code chunk:
> _input_: input_file profile_genome_fa*
```bash
# Code chunk
```
> _output_: results_aligned_bam
Input declaration (``_input_``):
Specifies which variables the chunk reads. This tells Docker/Singularity which files and directories need to be mounted into the container as read-only (ro).
Variable names must match defined arguments or profile/results variables
Supports wildcard suffixes to control which related files are bound
All input files are mounted read-only for safety
Output declaration (``_output_``):
Specifies which results keys this chunk produces. Docker/Singularity mounts the parent directory of each output file as read-write (rw).
Lists which results keys this chunk produces
Parent directory is automatically bound (no wildcard pattern needed)
Output files must be written to the mounted directory
Wildcard Suffixes (Input Only):
Wildcards are only used in _input_ declarations to control how Docker/Singularity
binds files into containers. They instruct the system which related files should
be included alongside the specified path.
Wildcard |
Meaning |
Use Case |
|---|---|---|
|
Recursive all matches |
|
|
Directory containing file |
Bind the entire directory (useful for complex data structures) |
|
Related file extensions |
|
none |
Exact match only |
Bind only the specified file |
Examples:
> _input_: genome_file* config_dir~ bam_file..
> _output_: results_output_bam results_output_log
Input mounting (read-only):
Given these argument values:
--genome_file=/data/genome.fa
--config_dir=/etc/config/settings.conf
--bam_file=/results/alignment.bam
The system binds:
genome_file*:/data/genome.fa,/data/genome.fa.fai,/data/genome.fa.gz, etc. (all matching files)config_dir~: Entire/etc/config/directorybam_file..:/results/alignment.bam,/results/alignment.bam.bai,/results/alignment.bam.md5, etc.Exact match (no suffix): Only that specific file
All input mounts are read-only.
Output mounting (read-write):
Given these results definitions:
output_bam: /work/results/aligned.bam
output_log: /work/results/aligned.log
The system binds:
Parent directory
/work/results/as read-writeBoth output files are written to this mounted directory
No wildcard patterns needed for outputs
Namespaces#
Namespaces define the execution environment for code chunks. They are
configured in profile files and referenced in snippet chunk headers using
namespace=program_name.
```bash
@/bin/sh, chunk1, namespace=samtools
samtools view -h alignment.bam
```
The namespace=samtools references a program defined in the active profile.
Bio_pype supports three namespace types:
path: Uses programs from system PATH
env_module@name: Loads Environment Modules before execution
docker@image: Runs inside a container (Docker/Singularity/uDocker)
See Profiles for detailed namespace configuration.
Python Snippets (Advanced)#
Python snippets provide more control and are useful for complex logic or when direct Python execution is needed.
File Structure#
Python snippets must be in a proper Python module:
my_snippets/
├── __init__.py # Required for module
├── align_reads.py # Snippet file
└── process_variants.py # Another snippet
The snippet name is the filename without ``.py`` extension.
Required Functions#
Every Python snippet must implement these four functions:
1. requirements()#
Returns resource requirements dictionary.
def requirements():
return {
'ncpu': 4,
'time': '02:00:00',
'mem': '8gb'
}
2. results(argv)#
Returns dictionary of output files. Receives parsed arguments.
def results(argv):
"""Define output files based on arguments"""
try:
output_file = argv['--output']
except KeyError:
output_file = argv['-o']
return {
'output_fasta': output_file,
'output_log': output_file + '.log'
}
Note: Access arguments using both long and short forms for robustness.
3. add_parser(subparsers, module_name)#
Creates argument parser (without adding arguments).
def add_parser(subparsers, module_name):
"""Create the argument parser"""
return subparsers.add_parser(
module_name,
help='Brief description of snippet',
add_help=False
)
4. <snippet_name>(subparsers, module_name, argv, profile, log)#
Main execution function. Function name must match the filename (without
.py).
def reverse_fa(subparsers, module_name, argv, profile, log):
"""Main execution function"""
# Parse arguments
parser = add_parser(subparsers, module_name)
parser.add_argument('-i', '--input', required=True,
help='Input fasta file')
parser.add_argument('-o', '--output', required=True,
help='Output fasta file')
args = parser.parse_args(argv)
# Your implementation here
with open(args.input, 'rt') as infile, \
open(args.output, 'wt') as outfile:
# Process data
pass
Parameters: - subparsers: argparse subparsers object -
module_name: Name of the snippet - argv: Command-line arguments
list - profile: Profile configuration dictionary - log: Logger
object
Optional: friendly_name(argv)#
Override default snippet name for logs and job IDs.
def friendly_name(argv):
"""Generate custom name for this execution"""
try:
input_file = argv['--input']
except KeyError:
input_file = argv['-i']
# Clean up filename
base_name = os.path.basename(input_file)
base_name = base_name.replace('.gz', '').replace('.txt', '')
return f'reverse_fa_{base_name}'
Complete Python Example#
import os
def requirements():
"""Define computational resources"""
return {
'ncpu': 1,
'time': '00:01:00',
'mem': '1gb'
}
def results(argv):
"""Define output files"""
try:
file = argv['--output']
except KeyError:
file = argv['-o']
return {'out': file}
def friendly_name(argv):
"""Generate friendly name for job tracking"""
try:
input_file = argv['--input']
except KeyError:
input_file = argv['-i']
input_file = input_file.replace('.gz', '').replace('.txt', '')
return f'reverse_fa_{os.path.basename(input_file)}'
def add_parser(subparsers, module_name):
"""Create argument parser"""
return subparsers.add_parser(
module_name,
help='Reverse a fasta sequence',
add_help=False
)
def reverse_fa(subparsers, module_name, argv, profile, log):
"""Main execution: reverse FASTA sequences"""
# Setup parser
parser = add_parser(subparsers, module_name)
parser.add_argument('-i', '--input', dest='input',
help='Input fasta file', required=True)
parser.add_argument('-o', '--output', dest='output',
help='Output fasta file', required=True)
args = parser.parse_args(argv)
# Process FASTA file
with open(args.input, 'rt') as input_file, \
open(args.output, 'wt') as output:
fasta_parser = parse_fasta(input_file)
for header, sequence in fasta_parser:
output.write(f'>{header} reverse\n')
# Write reversed sequence in 60-char lines
rev_seq = sequence[::-1]
for i in range(0, len(rev_seq), 60):
output.write(rev_seq[i:i+60] + '\n')
def parse_fasta(file):
"""Parse FASTA format file"""
header, sequence = '', ''
for line in file:
if line.startswith('>'):
if sequence:
yield (header, sequence)
header = line[1:].strip()
sequence = ''
else:
sequence += line.strip()
if sequence:
yield (header, sequence)
Choosing Between Markdown and Python#
Use Markdown when:#
Wrapping existing command-line tools
Running bash/shell scripts
Need portability across different execution environments
Want simpler, more declarative syntax
Working with bioinformatics pipelines
Use Python when:#
Need complex control flow or logic
Require direct Python library access
Have intricate data processing needs
Want better IDE support and debugging
Building reusable helper functions
Best Practices#
Use descriptive names: Snippet filenames should clearly indicate their purpose
Document thoroughly: Include helpful descriptions and argument help text
Handle errors gracefully: Validate inputs and provide informative error messages
Make snippets modular: Each snippet should do one thing well
Use namespaces: Make snippets portable by leveraging namespace configuration
Test with different arguments: Ensure default values work and required arguments are validated
Version control profiles: Keep execution environments reproducible via profiles
Common Patterns#
Chaining chunks with pipes#
```bash
@/bin/sh, step1, stdout=step2
cat input.txt | awk '{print $1}'
```
```bash
@/bin/sh, step2, stdout=step3
sort -u
```
```bash
@/bin/sh, step3
grep "pattern" > output.txt
```
Using multiple inputs#
## arguments
1. forward_reads/1
- help: Forward reads
- type: str
- required: true
2. reverse_reads/2
- help: Reverse reads
- type: str
- required: true
## snippet
> _input_: forward_reads reverse_reads
```bash
@/bin/sh, align
bwa mem reference.fa %(forward_reads)s %(reverse_reads)s > aligned.sam
```
Accessing profile variables#
> _input_: profile_reference_genome profile_dbsnp*
```bash
@/bin/sh, variant_call
gatk HaplotypeCaller \
-R %(profile_reference_genome)s \
--dbsnp %(profile_dbsnp)s \
-I input.bam -O output.vcf
```
Additional Resources#
Profile configuration: See Bio_pype Profiles documentation
Variable substitution: Python string formatting
Environment Modules: Environment Modules Project
Quick Reference#
Markdown Sections#
Required: description, requirements, results, arguments, snippet
Optional: name
Requirements Fields#
Required: ncpu, time, mem
Argument Options#
help, type, required, default, nargs, action, choices
Valid types: str, int, float
Valid actions: store_true, store_false
Variable Prefixes#
Arguments:
%(arg_name)s(long name only, e.g.,%(input)s)Profile files:
%(profile_<key>)s(e.g.,%(profile_genome_fa)s)Results:
%(results_<key>)s(e.g.,%(results_output_bam)s)Requirements:
%(requirements_<key>)s(e.g.,%(requirements_ncpu)s)
Results Chunk Header#
@interpreter, parser_format where parser_format is yaml or json
Snippet Chunk Header#
@interpreter, chunk_name [, namespace=program] [, stdout=next_chunk]
Python Required Functions#
requirements()- Return resource dict with ncpu, time, memresults(argv)- Return output files dictadd_parser(subparsers, module_name)- Create parser<snippet_name>(...)- Main execution functionfriendly_name(argv)- Optional custom name