I have a file containing miscellaneous parameters, stored as csv. I'm loading it into Python using pandas.read_csv
, which very conveniently returns a DataFrame
with one column. But I don't need any of the fancy pandas
features, so I immediately convert my data to a dict
.
My next problem is that the parameters all have different types: some are integers, some may be floats, and some are strings. pandas
usually loads them with dtype=object
, and they go into the dict
as strings. (Except sometimes they're all numeric, so I get a numeric dtype
right away.) I wrote a simple function that attempts to identify numeric types and convert them appropriately.
header = {key: typecast(value)
for (key, value) in dict(header['Value']).items()}
def typecast(value):
"""Convert a string to a numeric type if possible
>>> def cast_with_type(s):
... return (result := typecast(s), type(result))
>>> cast_with_type(123)
(123, <class 'int'>)
>>> cast_with_type('foo')
('foo', <class 'str'>)
>>> cast_with_type('123')
(123, <class 'int'>)
>>> cast_with_type('123.4')
(123.4, <class 'float'>)
"""
try:
if '.' in value:
return float(value)
else:
return int(value)
except TypeError:
return value
except ValueError:
return value
Is there a built-in feature that does this better than what I already have?
I've started using TOML for a different application, and I think it's a better fit here than CSV. Starting with Python 3.11, tomllib.load
returns a dictionary with each entry in an appropriate type, i.e., it does the typecasting for you.
As I said in a comment earlier, I'm not sure it's worth porting a bunch of old CSV files to TOML, but if I had to implement this feature from scratch, I would use TOML.