pythondataframetest-data

How can I add pandas.Dataframe to file that is created as test_data?


I split data for train data and test data for machine learning like this

train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)

and now I want to save each data to 'train_data' and 'test_data' file that are accessible like './train_data' or './test_data'

but I don't know how. I found there is 'to_csv' but it's not for this I think cuz when I did

test_df.to_csv('./test_data')

I get error saying IsADirectoryError: [Errno 21] Is a directory: './test_data'. How should I do ?


Solution

  • import pandas as pd
    from sklearn.model_selection import train_test_split
    
    #creating your dataframe
    df = ...
    
    train_data, test_data = train_test_split(df, test_size=0.2, random_state=42)
    
    train_data.to_csv(open('train_data.csv','w'))
    test_data.to_csv(open('test_data.csv','w'))
    #you can optionally add an encoding, on my machine I always use encoding="utf-8"