pandasdataframepy-datatable

With datatable is there a way to make faster appending than DataFrame?


I know read csv file in datatable is much faster than pandas DataFrame.

However, in my case

I have several csv files and i have to append one by one all of them.

So i am doing append all of these pd.read_csv(file) to empty DataFrame.

Will it be faster read csv file with datatable and append it to empty datatble

and then finally convert final datatable to csv?

So i want to know the fastest way to append csv file except pandas DataFrame


Solution

  • This is what I do when I have lots of csv files.

    I use glob to grab all the csv file paths:

    from glob import glob
    all_csvs = glob('path-to-folder-containing-csv-files/*.csv')
    

    Now read all of them and append them.

    all_csvs_appended = dt.rbind(iread(all_csvs))
    

    If all your csv files do not have the same columns, you may need to add force=True to rbind.