I have a .dat file that looks like this.
6.74E+01 "methane" "74-82-8" "L"
5.06E+01 "ethane" "74-84-0" "L"
7.16E+01 "propane" "74-98-6" "L"
9.59E+01 "butane" "106-97-8" "L"
1.20E+02 "2-methylpropane" "75-28-5" "L"
3.73E+02 "dimethylpropane" "463-82-1" "L"
1.25E+02 "pentane" "109-66-0" "L"
This .dat file appears to be haphazardly created. As far as I can tell, the columns are separated by varying numbers of spaces. Further down the file, some rows also have one extra column for comments. I need to read this into a Pandas dataframe. I have tried...
raw = pd.read_table(r'FILE PATH')
raw.columns = ['Value', 'Name', 'Numbers', 'Letter']
Which then throws an error saying "Exception has occurred: ValueError Length mismatch: Expected axis has 1 elements, new values have 4 elements"
I was expecting an error, but this makes it look like there is only 1 column. I am totally at a loss and I hope someone can help. Thanks
Edit: The extra columns have a single space of separation.
1.01E-02 "2,3-benzindene" "86-73-7" "M" ! fluorene
Assuming that columns are defined by runs of whitespace, you can use the delim_whitespace=True
argument of read_table
.
I assume that the file does not contain a header line. By specifying the column names through the names
argument, you avoid a) that the first line is interpreted as a header line and b) that the parser is confused by the "extra columns".
raw = pd.read_table(filename, delim_whitespace=True,
names=['Value', 'Name', 'Numbers', 'Letter'])
Result of print(raw)
:
Value Name Numbers Letter
0 67.4000 methane 74-82-8 L
1 50.6000 ethane 74-84-0 L
2 71.6000 propane 74-98-6 L
3 95.9000 butane 106-97-8 L
4 120.0000 2-methylpropane 75-28-5 L
5 373.0000 dimethylpropane 463-82-1 L
6 125.0000 pentane 109-66-0 L
7 0.0101 2,3-benzindene 86-73-7 M