pythoncsvpython-2.5

Read data from CSV file and transform from string to correct data-type, including a list-of-integer column


When I read data back in from a CSV file, every cell is interpreted as a string.

(I wrote a 2-dimensional list, where each column is of a different type (bool, str, int, list of integer), out to a CSV file.)

Sample data (in CSV file):

IsActive,Type,Price,States
True,Cellphone,34,"[1, 2]"
,FlatTv,3.5,[2]
False,Screen,100.23,"[5, 1]"
True,Notebook, 50,[1]

Solution

  • As the docs explain, the CSV reader doesn't perform automatic data conversion. You have the QUOTE_NONNUMERIC format option, but that would only convert all non-quoted fields into floats. This is a very similar behaviour to other csv readers.

    I don't believe Python's csv module would be of any help for this case at all. As others have already pointed out, literal_eval() is a far better choice.

    The following does work and converts:

    You may also use it for booleans and NoneType, although these have to be formatted accordingly for literal_eval() to pass. LibreOffice Calc displays booleans in capital letters, when in Python booleans are Capitalized. Also, you would have to replace empty strings with None (without quotes)

    I'm writing an importer for mongodb that does all this. The following is part of the code I've written so far.

    [NOTE: My csv uses tab as field delimiter. You may want to add some exception handling too]

    def getFieldnames(csvFile):
        """
        Read the first row and store values in a tuple
        """
        with open(csvFile) as csvfile:
            firstRow = csvfile.readlines(1)
            fieldnames = tuple(firstRow[0].strip('\n').split("\t"))
        return fieldnames
    
    def writeCursor(csvFile, fieldnames):
        """
        Convert csv rows into an array of dictionaries
        All data types are automatically checked and converted
        """
        cursor = []  # Placeholder for the dictionaries/documents
        with open(csvFile) as csvFile:
            for row in islice(csvFile, 1, None):
                values = list(row.strip('\n').split("\t"))
                for i, value in enumerate(values):
                    nValue = ast.literal_eval(value)
                    values[i] = nValue
                cursor.append(dict(zip(fieldnames, values)))
        return cursor