I am trying to using snakemake to output some files from a specific job. Basically I have different channels of a process, that span different mass ranges. Depending then on the {channel, mass} pair I have to run jobs for different values of "norms". I wanted to to this using:
import numpy as np
import pickle as pkl
particle_masses = {
5: 4.18, # Bottom quark mass ~4.18 GeV however there is an issue with the charon spectra below 10 GeV
8: 80.3, # W boson mass ~80.379 GeV
11: 1.777, # Tau lepton mass ~1.777 GeV
12: 0, # Electron neutrino mass ~0 (neutrino masses are very small)
14: 0, # Muon neutrino mass ~0
16: 0 # Tau neutrino mass ~0
}
mass_values = np.logspace(0, np.log10(499), num=25).tolist()
# Masked mass dictionary with rounded values
masked_mass_dict = {}
for channel, min_mass in particle_masses.items():
# Apply mask to exclude values below the particle mass threshold
masked_mass = [round(m, 2) for m in mass_values if m >= max(min_mass, 3)]
masked_mass_dict[channel] = masked_mass
channels = list(masked_mass_dict.keys())
rule all:
input:
expand(
f"{DATA_LOC}/signal/channel_{{channel}}/trial_distributions/{{mass}}_trial_distrib_{{norm}}.h5",
channel=channels,
mass=lambda wildcards: masked_mass_dict[wildcards.channel],
norm=lambda wildcards: get_norms({"mass": wildcards.mass, "channel": wildcards.channel}),
)
rule compute_trial_distribution:
input:
signal_file=f"{DATA_LOC}/signal/channel_{{channel}}/mc_distrib/{{mass}}_mc_distrib.h5",
output:
norm_file=f"{DATA_LOC}/signal/channel_{{channel}}/trial_distributions/{{mass}}_trial_distrib_{{norm}}.h5"
shell:
"""
...
"""
def get_norms(dict):
"""
Get the norm values (sensitivity) for the given channel and mass from the pre- loaded sensitivity dictionary.
"""
channel = dict["channel"]
mass = dict["mass"]
channel = int(channel)
mass = float(mass)
#channel = int(wildcards.channel)
#mass = float(wildcards.mass)
# Load sensitivity dictionary
dictionary_file = "path/"
with open(dictionary_file, "rb") as f:
sensitivities_dict = pkl.load(f)
# Get the sensitivity data for the specified channel
if channel not in sensitivities_dict:
raise ValueError(f"Channel {channel} not found in sensitivity dictionary.")
sensitivity_mass_array = sensitivities_dict[channel]
mass_index = np.where(sensitivity_mass_array[0] == mass)[0]
if len(mass_index) == 0:
raise ValueError(f"Mass {mass} not found for channel {channel}")
# Calculate norms
azimov_norm = sensitivity_mass_array[1][mass_index[0]]
norms = np.linspace(0.1 * azimov_norm, 10 * azimov_norm, 50)
norms = np.insert(norms, 0, 0) # Include a norm value of 0 for background-only distribution
norms = np.array([0])
# Convert to list for use in Snakemake params or shell
return norms.tolist()
However it seems that this does not work. I am not sure how to correctly implement this... This is the error I get:
InputFunctionException in rule all in file /home/egenton/upgrade_solar_WIMP/scripts/Snakefile, line 45:
Error:
AttributeError: 'Wildcards' object has no attribute 'channel'
Wildcards:
If anybody knows how this can be solved that would be very useful !
I tried inputting the channel mass pairs to expand the norms from the get_norm function but that did not work as the getnorm function does recognize the wildcard.
Rule all
cannot have wildcards (i.e. its Wildcards
object doesn't have attributes representing wildcard values). This comes from the fact that it doesn't have an output "consumed" by downstream rules.
Wildcards are determined by matching patterns in output sections of rules with files required in the input of downstream rules. In your case, this will happen when executing instances of rule compute_trial_distribution
, matching its output file pattern with one of the concrete files found in the input of all
. The wildcards values will exist at the level of compute_trial_distribution
, and can be used to compute its input (or in a params
or shell
, or run
directive...).
For the input of all
, you should define an explicit list of all the final files you want. This can often be done using expand
, but providing lists of values for the parameters instead of lambda functions.
However for what you want, I think it is simpler to just use plain Python to compute what you want:
# Define get_norms before using it
def get_norms(...):
# ...
distribs = []
for channel in channels:
mass = masked_mass_dict[channel]
for norm in get_norms({"mass": mass, "channel": channel}):
distribs.append(
f"{DATA_LOC}/signal/channel_{channel}/trial_distributions/{mass}_trial_distrib_{norm}.h5"
rule all:
input:
distribs
In my opinion, snakemake would be much less confusing for beginners if expand
was replaced with plain Python constructs in the tutorials and examples. This would be the opportunity to learn some Python basics, which can turn out very useful to write non-completely-straightforward workflows.
Side note: Couldn't you define get_norms
as a function with two parameters (channel
and mass
) instead of a function taking a dictionary?