pythonpandaspytorchtorchtext

How to bring pytorch datasets into pandas dataframe


I have seen a lot of code on how to convert pandas data to pytorch dataset. However, I haven't found or been able to figure out to do the reverse. i.e. Load pytorch dataset into pandas dataframe. I want to load AG news into pandas. Can you please help? Thanks.

from torchtext.datasets import AG_NEWS


Solution

  • You can use:

    from torchtext.datasets import AG_NEWS
    
    train, test = AG_NEWS()
    df_train = pd.DataFrame(train, columns=['label', 'text'])
    df_test = pd.DataFrame(test, columns=['label', 'text'])
    

    Output:

    >>> df_train.head()
       label                                               text
    0      3  Wall St. Bears Claw Back Into the Black (Reute...
    1      3  Carlyle Looks Toward Commercial Aerospace (Reu...
    2      3  Oil and Economy Cloud Stocks' Outlook (Reuters...
    3      3  Iraq Halts Oil Exports from Main Southern Pipe...
    4      3  Oil prices soar to all-time record, posing new...
    
    
    >>> df_test.head()
       label                                               text
    0      3  Fears for T N pension after talks Unions repre...
    1      4  The Race is On: Second Private Team Sets Launc...
    2      4  Ky. Company Wins Grant to Study Peptides (AP) ...
    3      4  Prediction Unit Helps Forecast Wildfires (AP) ...
    4      4  Calif. Aims to Limit Farm-Related Smog (AP) AP...