pythoncsv

Find number of columns in csv file


My program needs to read csv files which may have 1,2 or 3 columns, and it needs to modify its behaviour accordingly. Is there a simple way to check the number of columns without "consuming" a row before the iterator runs? The following code is the most elegant I could manage, but I would prefer to run the check before the for loop starts:

import csv
f = 'testfile.csv'
d = '\t'

reader = csv.reader(f,delimiter=d)
for row in reader:
    if reader.line_num == 1: fields = len(row)
    if len(row) != fields:
        raise CSVError("Number of fields should be %s: %s" % (fields,str(row)))
    if fields == 1:
        pass
    elif fields == 2:
        pass
    elif fields == 3:
        pass
    else:
        raise CSVError("Too many columns in input file.")

Edit: I should have included more information about my data. If there is only one field, it must contain a name in scientific notation. If there are two fields, the first must contain a name, and the second a linking code. If there are three fields, the additional field contains a flag which specifies whether the name is currently valid. Therefore if any row has 1, 2 or 3 columns, all must have the same.


Solution

  • You can use itertools.tee

    itertools.tee(iterable[, n=2])
    Return n independent iterators from a single iterable.

    eg.

    reader1, reader2 = itertools.tee(csv.reader(f, delimiter=d))
    columns = len(next(reader1))
    del reader1
    for row in reader2:
        ...
    

    Note that it's important to delete the reference to reader1 when you are finished with it - otherwise tee will have to store all the rows in memory in case you ever call next(reader1) again