snakemakepypsa

How to log time duration (for each job and total) for a snakemake workflow?


If I run a snakemake workflow, the start time for each job is logged to the console, for example

[Fri Jul 12 10:39:15 2024]

How can I adapt the logging to also show the duration for each job and the total duration in ms?

a) Is there some logging config file, where I could adapt a format string etc.?

b) Is there some profiling tool for snakemake, that helps to monitor and optimize the performance of snakemake workflows?

My use case is to compare different solvers for pypsa-eur and find out more about their limitations. I could manually subtract the start times of the jobs but that does not make sense to me.

Assuming unrestricted shared filesystem usage.
Building DAG of jobs...
Provided cores: 8
Rules claiming more threads will be scaled down.
Job stats:
job         count
--------  -------
run_main        1
total           1

Select jobs to execute...
Execute 1 jobs...

[Fri Jul 12 10:39:15 2024]
localrule run_main:
    input: input/input.txt
    output: output/output.txt
    jobid: 0
    reason: Code has changed since last execution
    resources: tmpdir=C:\Users\eis\AppData\Local\Temp\PyCharmPortableTemp

Running snake_script.py at C:\python_env\workspace\resilient\snakemake_demo
Processed data written to output/output.txt
[Fri Jul 12 10:39:15 2024]
Finished job 0.
1 of 1 steps (100%) done

The documentation of snakemake contains a section about logging:

https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files

However, that section only describes how to redirect the logs for jobs to some file instead of printing them all to the console.


Solution

  • A. Benchmark Rules

    Snakemake support a benchmark keyword in its rules:

    https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#benchmark-rules

    If the file path for a benchmark is specified, snakemake creates a correponding text file including benchmark information.

    PyPSA-Eur applies this feature to create benchmarking information under /results/benchmarks

    Example content:

    s   h:m:s   max_rss max_vms max_uss max_pss io_in   io_out  mean_load   cpu_time
    27121.7106  7:32:01 4083.46 6278.57 3738.98 3873.53 0.12    153.27  100.29  27196.06
    

    Explanation of parameters from https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#benchmark-rules

    s: Wall clock time (in seconds),
    
    h:m:s: Wall clock time (in hour:minutes:seconds),
    
    max_rss: Max RSS memory usage (in megabytes),
    
    max_vms: Max VMS memory usage (in megabytes),
    
    max_uss: Max USS memory usage (in megabytes),
    
    max_pss: Max PSS memory usage (in megabytes),
    
    io_in: I/O read (in bytes),
    
    io_out: I/O written (in bytes),
    
    mean_load: CPU load = CPU time (cpu_usage) divided by wall clock time (s),
    
    cpu_time: CPU time user+system (seconds),
    

    (Since version 8.11.0 of snakemake, it is possible to have extra benchmark metrics, see snakemake documentation.)

    The benchmark is based on a custom implementation of Snakemake:

    https://snakemake.readthedocs.io/en/v7.1.0/_modules/snakemake/benchmark.html

    B. html report feature

    Snakemake has a report feature:

    snakemake --report report.html
    

    https://snakemake.readthedocs.io/en/stable/snakefiles/reporting.html#rendering-reports

    and is able to create plots about the job runtimes.

    runtime statistics

    Example report:

    https://snakemake.github.io/resources/report.html# => Statistics

    (I find the snakemake documentation about the report feature a little confusing because it first goes into detail and the most relevant section on how to actually create the report is located at the end of the page.

    You can use the above command without adapting the Snakefile. The usage of report directives inside the Snakefile is optional and helpful, if you want to include additional information in the report.)

    C. Python logging module

    Snakemake uses the python logging module under the hood and you could adapt the logging format in your Snakefile to log additional information to the console:

    a) use predefined logging parameters

    import logging

    # doc on logging format:
    # https://docs.python.org/3/howto/logging.html#changing-the-format-of-displayed-messages
    # https://docs.python.org/3/library/logging.html#logrecord-attributes
    logging.basicConfig(
        format="[%(relativeCreated)d] %(message)s",
    )
    

    The parameter realtiveCreated represents are relative time:

    Time in milliseconds when the LogRecord was created, relative to the time the logging module was loaded.

    https://docs.python.org/3/library/logging.html#logrecord-attributes

    b) use custom log handler

    In order to log the time difference to the last log message, a custom log handler could be used:

    class DeltaTimeHandler(logging.StreamHandler):
        def __init__(self):
            super().__init__()
            self.last_time = None
    
        def emit(self, record):
            current_time = record.relativeCreated
            if self.last_time is not None:
                delta = current_time - self.last_time
                record.delta_time = delta
            else:
                record.delta_time = 0
            self.last_time = current_time
            super().emit(record)
    
    
    logging.basicConfig(
        format="[%(relativeCreated)d] [%(delta_time)d ms] %(message)s",
        handlers=[DeltaTimeHandler()],
    )
    

    Here is another variant that only considers specific messages for the time difference:

    class DeltaTimeHandler(logging.StreamHandler):
        def __init__(self):
            super().__init__()
            self.last_time = None
    
        def emit(self, record):
            message = record.msg
            if message.startswith('['):
                current_time = record.relativeCreated
                if self.last_time is not None:
                    delta = int(current_time - self.last_time)
                    if delta > 0:
                        record.msg = f"[duration: {delta} ms]"
                        super().emit(record)
    
                self.last_time = current_time
    
    logging.basicConfig(
        format="[%(relativeCreated)d] %(message)s",
        handlers=[DeltaTimeHandler()],
    )
    

    c) Adapt snakemake logging code

    If the above solutions are not sufficient, you could adapt the logging part of snakemake:

    https://github.com/snakemake/snakemake/blob/main/snakemake/logging.py

    D. Python profiling tools

    For general python tools about benchmarks and profiling see

    https://realpython.com/python-profiling/

    https://docs.python.org/3/library/profile.html

    https://github.com/sumerc/yappi

    PyCharm Professional edition and Visual Studio provide GUI profiling features, while PyCharm Community Edition and VsCode don't.

    Also see

    Is there a visual profiler for Python?

    https://jiffyclub.github.io/snakeviz/

    E. Memory profiling in PYSA-EUR

    PYPSA-EUR includes a memory_logger function to log memory usage while solving:

    https://github.com/PyPSA/pypsa-eur/blob/master/scripts/_benchmark.py

    that is based on memory_profiler

    Example usage:

    with memory_logger(
        filename=getattr(snakemake.log, "memory", None), interval=30.0
    ) as mem:
        n = solve_network(
            n,
            config=snakemake.config,
            solving=snakemake.params.solving,
            log_fn=snakemake.log.solver,
        )
    
    logger.info(f"Maximum memory usage: {mem.mem_usage}")
    

    Source: https://github.com/PyPSA/pypsa-eur/blob/master/scripts/solve_network.py

    Example output:

    elec_s_37_lvopt___2050_memory.log

    MEM 563.062500 1721309193.8355
    MEM 563.562500 1721309193.8607
    MEM 2816.871094 1721309223.9120
    MEM 3454.433594 1721309253.9648
    MEM 3897.300781 1721309284.0169
    

    The numbers represent:

    In one case I was wondering why the numbers shown by gnome System Monitor were significantly higher than the logged numbers. I had some additional zombie-python-processes running from canceled snakemake workflows...that were still consuming memory.