pandasread-csv

I want to read a csv file with pandas, I want to skip rows but I want to keep the original line numbers


I want to keep track of the original line numbers.

I tried using the skiprows parameter of pd.read_csv(). The original line numbers are not preserved so.

If I start reading at row 100, then the first row number in the in the obtained dataframe will be 0. Moreover I want to preserve the original header.


Solution

  • If you have a range index and want to keep the header and skip the next n rows, use a range in skiprows:

    file_path = io.StringIO('''A,B
    1,2
    3,4
    5,6
    7,8
    ''')
    
    n = 2
    df = pd.read_csv(file_path, header=0, skiprows=range(1,n+1))
    df.index += n
    

    Output:

       A  B
    2  5  6
    3  7  8