pythonpandastypes

Convert pandas Series to dict with type conversion?


I have a file containing miscellaneous parameters, stored as csv. I'm loading it into Python using pandas.read_csv, which very conveniently returns a DataFrame with one column. But I don't need any of the fancy pandas features, so I immediately convert my data to a dict.

My next problem is that the parameters all have different types: some are integers, some may be floats, and some are strings. pandas usually loads them with dtype=object, and they go into the dict as strings. (Except sometimes they're all numeric, so I get a numeric dtype right away.) I wrote a simple function that attempts to identify numeric types and convert them appropriately.

header = {key: typecast(value)
          for (key, value) in dict(header['Value']).items()}

def typecast(value):
    """Convert a string to a numeric type if possible

    >>> def cast_with_type(s):
    ...     return (result := typecast(s), type(result))
    >>> cast_with_type(123)
    (123, <class 'int'>)
    >>> cast_with_type('foo')
    ('foo', <class 'str'>)
    >>> cast_with_type('123')
    (123, <class 'int'>)
    >>> cast_with_type('123.4')
    (123.4, <class 'float'>)
    """
    try:
        if '.' in value:
            return float(value)
        else:
            return int(value)
    except TypeError:
        return value
    except ValueError:
        return value

Is there a built-in feature that does this better than what I already have?


Solution

  • I've started using TOML for a different application, and I think it's a better fit here than CSV. Starting with Python 3.11, tomllib.load returns a dictionary with each entry in an appropriate type, i.e., it does the typecasting for you.

    As I said in a comment earlier, I'm not sure it's worth porting a bunch of old CSV files to TOML, but if I had to implement this feature from scratch, I would use TOML.