I am trying to convert pandas dataframe to fast ai's tabular dataloader and use that format of data to train a fast ai's tabular learner. I was able to convert the pd.DataFrame
to fastai.tabular.TabularDataLoaders
but couldn't train due to the above error.
cat_names = ['location']
cont_names = []
# all the other columns except the location are float or int i.e., numerical
for col in train_19.columns:
if(col != 'location' and col != 'emission'):
cont_names.append(col)
procs = [Categorify, FillMissing, Normalize]
data = TabularDataLoaders.from_df(train_19, path=path, cat_names=cat_names, cont_names=cont_names, y_names='emission', procs=procs, bs=64)
config = tabular_config(ps=[0.001,0.01], embed_p=0.04)
learner = tabular_learner(data, layers=[300, 200, 100, 50], metrics=[rmse], config=config)
learner.lr_find(start_lr=1e-05, end_lr=1e+05, num_it=100)
Running the last line, is giving me the error as:
RuntimeError: The size of tensor a (6400) must match the size of tensor b (64)
at non-singleton dimension 0
Here, train_19
is as shown in the image below:
After transforming pandas data frame into fast ai's data loader, it looks like below:
I did a little bit of searching, but found issues related to shape of the input, but I think, my case is different. If any one would like to inspect more and need the exact csv file, I can create a link to the data and post it here.
I tried to localize the error by playing around with the various parameters in the code. I found that, changing the number of neurons for the third layer is changing the numeric values error message. In the above case, the number of neurons for third layer is 100, and the error message is:
The size of tensor a (6400) must match the size of tensor b (64) at non-singleton dimension 0
where 64 is bs
batch size.
After a while, I understood that this error is related to the drop_out
parameter ps
, while defining the config of the tabular_learner
in the third line from the bottom. Dropping the parameter ps
will clear all the errors and restrictions on number of layers and number of neurons in them. Finally, we can train you tabular learner.