I am realizing an XG Boost model. I did my train-test split on a dataframe having 91 columns. I want to use my model on a new dataframe which have different columns than my training set. I have removed the extra columns and added the ones which were present in the train dataset and not the new one.
However, I cannot use the models because the new set does not have the same number of columns but when I am computing the list of the differences in columns the list is empty.
Do you have an idea of how I could correct this problem ?
Thanks in advance for your time !
You can try like this :
import pandas as pd
X_PAU = pd.DataFrame({'test1': ['A', 'A'], 'test2': [0, 0]})
print(len( X_PAU.columns ))
X = pd.DataFrame({'test1': ['A', 'A']})
print(len( X.columns ))
# Your implementation
print(set(X.columns) - set(X_PAU.columns)) #This should be empty set
#
print(X_PAU.columns.difference(X.columns).tolist()) # this will print the missing column name
print(len(X_PAU.columns.difference(X.columns).tolist())) # this will print the difference number
Output
2
1
set()
['test2']
1