pythonpandas

pandas read_csv and keep only certain rows (python)


I am aware of the skiprows that allows you to pass a list with the indices of the rows to skip. However, I have the index of the rows I want to keep.

Say that my cvs file looks like this for millions of rows:

  A B
0 1 2
1 3 4
2 5 6
3 7 8
4 9 0

The list of indices i would like to load are only 2,3, so

index_list = [2,3]

The input for the skiprows function would be [0,1,4]. However, I only have available [2,3].

I am trying something like:

pd.read_csv(path, skiprows = ~index_list)

but no luck.. any suggestions?

thank and I appreciate all the help,


Solution

  • I think you would need to find the number of lines first, like this.

    num_lines = sum(1 for line in open('myfile.txt'))
    

    Then you would need to delete the indices of index_list:

    to_exclude = [i for i in range(num_lines) if i not in index_list]
    

    and then load your data:

    pd.read_csv(path, skiprows = to_exclude)