[SOLVED] PartialDependenceDisplay.from_estimator plots having lines with 0 values

PartialDependenceDisplay.from_estimator plots having lines with 0 values

Need to evaluate the two way interaction between two variables after regressor model. Used PartialDependenceDisplay.from_estimator to plot but the contour lines inside the plot all have value 0.Not sure what might cause this. Checked the data and model and there are no problems while loading the model and data. Checked the other two variable combinations and they have same issue.

from sklearn.inspection import partial_dependence, PartialDependenceDisplay
model = load_model(model_path)
model_features = model.feature_name_

fig, ax = plt.subplots(figsize=(10,5))
X = training_data[model_features]
PartialDependenceDisplay.from_estimator(model, X, features=[('temperature',  'speed')], ax=ax, n_jobs=-1, grid_resolution=20)

Solution

Most probably your contour values are all < 0.005. Contour labels are formatted as "%2.2f" and there appears to be no documented way of changing this format. The only workaround I could think of is to retrieve the labels and their values and replace the label texts:

import matplotlib.pyplot as plt
from matplotlib.text import Text
import numpy as np
from sklearn.datasets import make_friedman1
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.inspection import PartialDependenceDisplay

X, y = make_friedman1()
clf = GradientBoostingRegressor(n_estimators=10).fit(X, y)

pdd = PartialDependenceDisplay.from_estimator(clf, X, [0, (0, 1)])

for c in pdd.axes_[0][1].get_children():
  if isinstance(c, Text):
    try:
       label_value = float(c.get_text())
    except ValueError:
       continue
    idx = np.argmin(abs(pdd.contours_[0][1].levels - label_value))
    c.set_text(f'{pdd.contours_[0][1].levels[idx]:g}')

Update 1

The above method doesn't work if all existing labels are identical. A somewhat unreliable quick and dirty workaround would be to rely on the fact that the label texts are added to the Axes in ascending order. The first and last level are not labelled. This leads to the following example:

import matplotlib.pyplot as plt
from matplotlib.text import Text
from sklearn.datasets import make_friedman1
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.inspection import PartialDependenceDisplay

X, y = make_friedman1(random_state=42)
clf = GradientBoostingRegressor(n_estimators=10).fit(X, y)

pdd = PartialDependenceDisplay.from_estimator(clf, X, [0, (0, 1)])

i = 1
for c in pdd.axes_[0][1].get_children():
  if isinstance(c, Text) and c.get_text():
    c.set_text(f'{pdd.contours_[0][1].levels[i]:g}')
    i += 1

Update 2

Another (reliable but still hacky) possibility is to overwrite the clabel function used by Scikit with your own version that uses an appropriate format specification. In order to get hold of this function you'll have to provide your own Axes instance to PartialDependenceDisplay.from_estimator:

import matplotlib.pyplot as plt
from sklearn.datasets import make_friedman1
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.inspection import PartialDependenceDisplay

fig, axes = plt.subplots(ncols=2)

original_clabel = axes[1].clabel
def new_clabel(CS, **kwargs):
  del kwargs['fmt']
  return original_clabel(CS, fmt='%2.5f', **kwargs)
axes[1].clabel = new_clabel

X, y = make_friedman1(random_state=42)
clf = GradientBoostingRegressor(n_estimators=10).fit(X, y)

pdd = PartialDependenceDisplay.from_estimator(clf, X, [0, (0, 1)], ax=axes)