automationcluster-computingsnakemakehpc

Snakemake remote rules re-read config file?


I'm using snakemake (v7.22.0) to execute multiple jobs on a cluster. I have several different versions of the workflow which I somtimes execute in parallel, which involves changing the config file (and others). My problem is, if I understand the behavior correctly, that if I edit the config file before some of the jobs start running on the cluster, then they read the config file after I the edits, using different values from when the original workflow that created the job was created. In particular, this changes the workdir, so relative paths become incorrect.

I see this according to the job's stdout, which reiterates the config, e.g.

{'workdir': 'workdir1', 'seeds': ...}

but I get different values for 'workdir' for different jobs originating from the same snakemake session.

Is this indeed the behavior of snakemake in remote rules, or am I mistaken? If so, what I can do to be able to change the config and still get consistent results from all the remote jobs? Using multiple config files would lead to more human errors in my scenario.

Thanks in advance!


Solution

  • If I remember correctly and if things have not changed, snakemake re-executes the snakefile every time it submits a job to the cluster. So yes, if you change the config file half-way through the pipeline execution, the following jobs will use the updated config.

    If different workflows use different configs, I would perhaps split the config file into a "common" file and a "workflow-specific" file. The two files will be then read by each version of the workflow.

    Alternatively, have entries in your config file that instructs snakemake on what to use. E.g. your config file:

    common_option1: foo
    common_option2: bar
    workflow_A:
        workdir: /some/path
    workflow_B:
        workdir: /other/path
    

    Then on top of your workflow file version "A":

    WORKFLOW = 'workflow_A'
    
    workdir = config[WORKFLOW]['workdir']
    

    Then on top of your workflow file version "B":

    WORKFLOW = 'workflow_B'
    
    workdir = config[WORKFLOW]['workdir']