pythoncontinuous-integrationpytestsnakemake

Writing tests for input functions defined in `common.smk` (Snakemake)


I am looking for advice on testing input functions in Snakemake. Input functions are commonly defined in the common.smk file, e.g.:

import pandas as pd
from snakemake.io import glob_wildcards

def find_fastq_files(wildcards) -> list[str]:
    """
    Return a list of fastq files given sample name and processing stage.
    """

    # Find the fastq file names for the sample
    sample_dir = f"data/{wildcards.sample}/fastq"
    file_names = glob_wildcards(f"{sample_dir}/{{file_name}}.fastq.gz").file_name

    # return the file paths
    return [f"{sample_dir}/{f}.fastq.gz" for f in file_names]

My current approach involves parsing the output of a snakemake dry run, followed by assertion using pytest. It would be cleaner to circumvent the dry run and test the input function directly, however I have not found an easy way to import a function that is defined within common.smk.

What would be the recommended way to test such an input function? Thanks!


Solution

  • You should be able to move this function to a file with a .py extension - eg. funcs.py, then in common.smk you would do from funcs import find_fastq_files.

    And now you can also import those functions into your unit tests as you would any regular Python code. In the example above your common.smk is valid Python but as things get more complex it's often good to split out the Snakemake stuff from the vanilla Python functions.

    Failing this, it's possible to get Snakemake to parse the entire workflow file and give you the functions without doing a dry run. Something like:

    from snakemake import Workflow
    wf = Workflow(snakefile="workflow/Snakefile")
    wf.config.update(dict()) # If you have mandatory config
    wf.include(wf.main_snakefile)
    # Extract the functions
    myfunc1 = wf.globals['myfunc1']
    
    # Now you can test myfunc1()
    assert myfunc1(123) == 321
    

    Hope that helps!