pythoncsvsplitdata-science

How can I split a Dataset from a .csv file for Training and Testing?


I'm using Python and I need to split my .csv imported data in two parts, a training and test set, E.G 70% training and 30% test.

I keep getting various errors, such as 'list' object is not callable and so on.

Is there any easy way of doing this?

Thanks

EDIT:

The code is basic, I'm just looking to split the dataset.

from csv import reader
with open('C:/Dataset.csv', 'r') as f:
    data = list(reader(f)) #Imports the CSV
    data[0:1] ( data )

TypeError: 'list' object is not callable


Solution

  • You can use pandas:

    import pandas as pd
    import numpy as np
    
    df = pd.read_csv('C:/Dataset.csv')
    df['split'] = np.random.randn(df.shape[0], 1)
    
    msk = np.random.rand(len(df)) <= 0.7
    
    train = df[msk]
    test = df[~msk]