Problem I am running a LightGBMModel via Darts with some (future) covariates. I want to understand the relevance of the different (lagged) features.
In particular, I would like to retrieve the feature importance for the lagged target variable as well as for the covariates using the original column names from the Darts TimeSeries object. In the LightGBM model object after fitting I can only see generic column names ("column_0", "column_1"). How can I connect this to meaningful names (e.g., target_lag_1, target_lag_2, name_of_covariate_lag_1, ...).
I want to include several future covariates (e.g., several datetime attributes like day of week with different encodings). It does not matter where the datetime attributes are created (e.g., using pandas, using Darts itself).
Minimal reproducable example I adopted the example from the documentation
This is the code from the documentation, just setting up the data and fitting the model:
from darts.datasets import WeatherDataset
from darts.models import LightGBMModel
series = WeatherDataset().load()
# predicting atmospheric pressure
target = series['p (mbar)'][:100]
# optionally, use past observed rainfall (pretending to be unknown beyond index 100)
past_cov = series['rain (mm)'][:100]
# optionally, use future temperatures (pretending this component is a forecast)
future_cov = series['T (degC)'][:106]
# predict 6 pressure values using the 12 past values of pressure and rainfall, as well as the 6 temperature
# values corresponding to the forecasted period
model = LightGBMModel(
lags=12,
lags_past_covariates=12,
lags_future_covariates=[0,1,2,3,4,5],
output_chunk_length=6,
verbose=-1
)
model.fit(target, past_covariates=past_cov, future_covariates=future_cov)
Having fitted the model, I now want to analyze the importance of the features.
for i, estimator in enumerate(model.model.estimators_):
print(f"Target {i} Importance (Gain):")
# Access LightGBM booster
booster = estimator.booster_
# Get feature names
feature_names = booster.feature_name()
# Get gain-based importance
importance = booster.feature_importance(importance_type='gain')
# Create mapping
named_importance = dict(zip(feature_names, importance))
print(named_importance)
This returns the feature importance for several columns in each estimator. But the feature names are generic names generated by LightGBM ('Column_1', 'Column_2', ...). I do not know how to link this back to the original column names in the TimeSeries object from Darts (e.g., 'rain (mm)', ''T (degC)') with the additional information which lag a feature importance is referring to.
The features that go into the models are available in model.lagged_feature_names
.
One of the authors addressed feature importances in Issue#1826, doing mostly what you've done, but they also referenced that along with a note about the feature names in Issue#2125.