I have this boruta code, and I want to generate the results in pandas with columns included
model = RandomForestRegressor(n_estimators=100, max_depth=5, random_state=42)
# let's initialize Boruta
feat_selector = BorutaPy(
verbose=2,
estimator=model,
n_estimators='auto',
max_iter=10, # numero di iterazioni da fare
random_state=42,
)
# train Boruta
# N.B.: X and y must be numpy arrays
feat_selector.fit(np.array(X), np.array(y))
# print support and ranking for each feature
print("\n------Support and Ranking for each feature------\n")
for i in range(len(feat_selector.support_)):
if feat_selector.support_[i]:
print("Passes the test: ", X.columns[i],
" - Ranking: ", feat_selector.ranking_[i], "✔️")
else:
print("Doesn't pass the test: ",
X.columns[i], " - Ranking: ", feat_selector.ranking_[i], "❌")
# features selected by Boruta
X_filtered = feat_selector.transform(np.array(X))
My selected result is this:
X.columns[feat_selector.support_]
Index(['J80', 'J100', 'J160', 'J200', 'J250'], dtype='object')
X_filtered
array([[12.73363 , 8.518314 , 5.2625847 , ..., 0.06733382]])
How do I generate the result in Pandas dataframe with the headers? Now I have up to 25 headers.
Since support_
is a boolean mask, you can index the columns and create a new dataframe.
X_filtered = pd.DataFrame(
feat_selector.transform(X.values),
columns=X.columns[feat_selector.support_]
)
Then again, with the latest master version, you can pass a dataframe to transform()
and flag return_df=True
. So that would look like:
X_filtered = feat_selector.transform(X, return_df=True)