pythonsnakemakepypsa

How to debug scripts for PyPSA-EUR?


I would like to set some VisualStudioCode or PyCharm breakpoints for the script

https://github.com/PyPSA/pypsa-eur/blob/master/scripts/prepare_sector_network.py

and then run and debug it to better understand how it works.

Usually the scripts of PyPSA-EUR are run as part of a snakemake workflow. Therefore, I currently see a few strategies:

a) Create a run configuration (launch.json) that uses snakemake as python module or executable

b) Run the script itself with python executable and mock the usage of snakemake workflow in a corresponding main function of the script if "snakemake" is not in globals()

c) Use some snakemake specific plugin like

https://github.com/JetBrains-Research/snakecharm

https://open-vsx.org/extension/snakemake/snakemake-lang

(do not support debugging, yet)

=> What is the recommended way to do it and where can I find instructions?

Actually, the script file does include a main section at the end of the file:

if __name__ == "__main__":
  #...
  #snakemake = mock_snakemake(...)

However, that functionality seems to be outdated and/or serves a different purpose? I created a corresponding bug ticket here:

https://github.com/PyPSA/pypsa-eur/issues/1118

If instead a) is the recommended way to do it, could someone please provide example vscode launch.json and PyCharm run configuration settings?

I tried to

a1) Use snakemake as a module in a PyCharm run configuration

enter image description here

a2) Create a dummy start script snake.py:

import sys
from snakemake.cli import main


if __name__ == "__main__":
    arguments = sys.argv[1:]
    main(arguments)

Setting a breakpoint in this script works and the snakemake workflow can be run with

python snake.py -call all

However, breakpoints inside the script that is referenced from Snakemake file do not work.

Related:

https://github.com/snakemake/snakemake/issues/2932

https://github.com/PyPSA/pypsa-eur/pull/107

How to debug snakemake snakefile in visual studio code?

https://github.com/JetBrains-Research/snakecharm/issues/142

https://github.com/JetBrains-Research/snakecharm/issues/25

https://github.com/snakemake/snakemake/issues/247

https://github.com/snakemake/snakemake/issues/1607


Solution

  • A. Debugging would work "out of the box" if you would


    # import your script in the Snakefile 
    from src.foo_script import main as foo_main  
    
    rule run_main:
        input:
            input_file="input/input.txt"
        output:
            output_file="output/output.txt"
    
        # do not use shell or script...
        # but a direct run command... this will keep the debugger attached:
        run:
            foo_main(input, output)
    

    B. However, as long as PyPSA uses the snakemake script directive for the inclusion of python scripts, you are stuck to the mock_snakemake workaround to mock the global snakemake context in a custom main section of your script. Also see

    Example:

    Let's say you want to debug the script "scripts/prepare_sector_network.py" while following the tutorial

    https://pypsa-eur.readthedocs.io/en/latest/tutorial_sector.html

    and running the command

    snakemake -call all --configfile config/test/config.overnight.yaml
    

    You can debug the individual python script using following steps:

    1. Find out what output file is created by the script that you would like to debug. In order to do so, you can search for the name of the rule in the console output and have a look at the corresponding output lines:

    [Mon Jul  8 13:59:56 2024]
    localrule prepare_sector_network:
        input: resources/test/profile_offwind-ac.nc, resources/test/profile_offwind-dc.nc, resources/test/profile_offwind-float.nc, resources/test/gas_network_elec_s_5.csv, resources/test/gas_input_locations_s_5.geojson, resources/test/gas_input_locations_s_5_simplified.csv, resources/test/snapshot_weightings_elec_s_5_ec_lv1.5_.csv, resources/test/networks/elec_s_5_ec_lv1.5_.nc, data/eurostat/Balances-April2023, resources/test/pop_weighted_energy_totals_s_5.csv, resources/test/pop_weighted_heat_totals_s_5.csv, resources/test/shipping_demand_s_5.csv, resources/test/transport_demand_s_5.csv, resources/test/transport_data_s_5.csv, resources/test/avail_profile_s_5.csv, resources/test/dsm_profile_s_5.csv, resources/test/co2_totals.csv, data/bundle/eea/UNFCCC_v23.csv, resources/test/biomass_potentials_s_5_2030.csv, resources/test/costs_2030.csv, resources/test/salt_cavern_potentials_s_5.csv, resources/test/busmap_elec_s.csv, resources/test/busmap_elec_s_5.csv, resources/test/pop_layout_elec_s_5.csv, resources/test/pop_layout_elec_s.csv, resources/test/industrial_energy_demand_elec_s_5_2030.csv, resources/test/hourly_heat_demand_total_elec_s_5.nc, resources/test/district_heat_share_elec_s_5_2030.csv, resources/test/temp_soil_total_elec_s_5.nc, resources/test/temp_soil_rural_elec_s_5.nc, resources/test/temp_soil_urban_elec_s_5.nc, resources/test/temp_air_total_elec_s_5.nc, resources/test/temp_air_rural_elec_s_5.nc, resources/test/temp_air_urban_elec_s_5.nc, resources/test/cop_soil_total_elec_s_5.nc, resources/test/cop_soil_rural_elec_s_5.nc, resources/test/cop_soil_urban_elec_s_5.nc, resources/test/cop_air_total_elec_s_5.nc, resources/test/cop_air_rural_elec_s_5.nc, resources/test/cop_air_urban_elec_s_5.nc, resources/test/solar_thermal_total_elec_s_5.nc, resources/test/solar_thermal_urban_elec_s_5.nc, resources/test/solar_thermal_rural_elec_s_5.nc
        output: results/test-sector-overnight/prenetworks/elec_s_5_lv1.5___2030.nc
        log: results/test-sector-overnight/logs/prepare_sector_network_elec_s_5_lv1.5___2030.log
        jobid: 4
        benchmark: results/test-sector-overnight/benchmarks/prepare_sector_network/elec_s_5_lv1.5___2030
        reason: Missing output files: results/test-sector-overnight/prenetworks/elec_s_5_lv1.5___2030.nc
        wildcards: simpl=, clusters=5, ll=v1.5, opts=, sector_opts=, planning_horizons=2030
        resources: tmpdir=/tmp, mem_mb=2000, mem_mib=1908
    

    If the output path is hard to find, you could also adapt the script to print some extra output :

    output_path = snakemake.output[0]
    print('######## output_path ########')
    print(output_path)
    

    Example output_path:

    results/test-sector-overnight/prenetworks/elec_s_5_lv1.5___2030.nc
    

    Known Issues:

    or use the --foreall (=-F) option, forgetting all intermediate files and re-downloading sources:

    snakemake -call all --configfile config/test/config.overnight.yaml -F
    

    Also see https://snakemake.readthedocs.io/en/stable/executing/cli.html#execution

    you might get

    MissingRuleException:
    No rule to produce results/test-sector-overnight/prenetworks/elec_s_5_lv1.5___2030.nc (if you use input functions make sure that they don't raise unexpected exceptions).
    

    2. Look for the pattern that is used by the corresponding snakemake rule for the output file.

    For the script "scripts/prepare_sector_network.py" the corresponding rule is "prepare_sector_network" in file "rules/build_sector.smk" and it defines its output as:

    output:
            RESULTS
            + "prenetworks/elec_s{simpl}_{clusters}_l{ll}_{opts}_{sector_opts}_{planning_horizons}.nc",
    

    3. Determine wildcards by comparing the pattern with the actual output file path:

    simpl: ''
    clusters: '5'
    ll: 'v1.5'
    opts: ''
    sector_opts: ''
    planning_horizons: '2030' 
    
    # RESULTS: 'results/test-sector-overnight'
    # RESULTS = "results/" + RDIR
    # RDIR = 'test-sector-overnight'
    

    4. Find the main section of the python script (scroll down to the end of "scripts/prepare_sector_network.py" or search for "mock_snakemake") and adapt the call to mock_snakemake to reproduce those wildcards. Also make sure to use the same config file as your original command. (The RDIR and RESULTS path is automatically determined based on the corresponding configfiles argument.)

    snakemake = mock_snakemake(
        "prepare_sector_network",
        configfiles="config/test/config.overnight.yaml",
        simpl="",
        clusters="5",
        ll="v1.5",
        opts="",
        sector_opts="", 
        planning_horizons="2030",
    )
    

    5. Set some breakpoint and run the script in debug mode of your wanted IDE.

    python prepare_sector_network.py
    

    The condition "snakemake" not in globals() under if __name__ == "__main__" will be True and the call to mock_snakemake prepares some context for your debugging session.

    6. Known issues

    If you face issues with non-existing files:

    Ensure that you are using the right config file and that the configfiles line is not out-commented or misses the 'config' folder in its path:

    # configfiles="config/test/config.overnight.yaml"
    

    If you use a wrong config file, the "test" part might be missing in some file paths, e.g.

    resources/networks/elec_s_5_ec_lv1.5_.nc
    

    instead of

    resources/test/networks/elec_s_5_ec_lv1.5_.nc
    

    and you get FileNotFoundError.

    b) Rerun your original command from console, to first run a full snakemake workflow, so that all input files exist for the script that you want to debug.

    C. If you want to inspect the execution of jobs in snakemake in detail, you can set breakpoints at following entry points:

    https://github.com/snakemake/snakemake-interface-executor-plugins/blob/main/snakemake_interface_executor_plugins/executors/base.py => run_jobs

    https://github.com/snakemake/snakemake/blob/main/snakemake/executors/local.py => run_wrapper