When I tried to read the data below with:
loadtxt('RSTN')
I got an error.
Then I tried to complete this missing data using:
genfromtxt('RSTN',delimiter=' ')
But I got this error:
Line #31112 (got 7 columns instead of 8)
I'd like to fill the missing data with nan
,
or something similiar.
I have data like this in an ascii file named RSTN
:
20120127165126 19 42 54 91 113 147 188 284
20120127165127 19 42 54 91 113 147 188 284
20120127165128 19 42 54 90 113 147 188 284
20120127165129 19 42 54 90 113 147 188 284
20120127165130 19 42 54 88 107 131 155 235
20120127165131 19 42 54 72 79 79 92 154
20120127165132 19 42 54 45 43 42 50 97
20120127165133 19 42 54 24 21 21 25 65
20120127165134 19 42 54 11 8 9 12 46
20120127165135 19 42 54 5 2 3 7 35
20120127165136 18 42 54 2 0 1 4 29
20120127165137 19 42 54 0 0 2 25
20120127165138 19 42 53 0 0 1 22
20120127165139 19 42 54 0 0 1 19
20120127165140 19 42 54 0 0 0 17
20120127165141 19 42 54 0 0 0 14
20120127165142 19 42 54 0 0 0 14
20120127165143 19 42 54 0 0 0 14
20120127165144 19 42 54 0 0 13
20120127165145 19 42 54 0 0 14
20120127165146 19 42 54 0 0 0 14
20120127165147 19 42 54 0 0 1 15
20120127165148 19 42 54 0 0 1 15
20120127165149 19 42 54 0 0 1 15
20120127165150 20 42 53 0 1 15
20120127165151 20 42 53 0 1 17
20120127165152 20 42 53 0 1 17
20120127165153 19 42 53 0 0 1 17
20120127165154 20 42 53 0 1 17
20120127165155 20 42 53 0 1 17
20120127165156 20 42 53 0 0 1 17
20120127165157 19 42 54 0 0 1 17
20120127165158 19 42 55 0 0 1 17
20120127165159 19 42 55 0 0 1 17
20120127165200 20 42 56 0 0 1 17
20120127165201 21 42 56 0 0 1 17
When I did this:
from pandas import *
data=read_fwf('26JAN12.K7O', colspecs='infer', header=None)
I got this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 429, in read_fwf
return _read(filepath_or_buffer, kwds)
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 198, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 479, in __init__
self._make_engine(self.engine)
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 592, in _make_engine
self._engine = klass(self.f, **self.options)
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 1954, in __init__
PythonParser.__init__(self, f, **kwds)
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 1237, in __init__
self._make_reader(f)
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 1957, in _make_reader
self.data = FixedWidthReader(f, self.colspecs, self.delimiter)
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 1933, in __init__
raise AssertionError()
AssertionError
If you have pandas you could parse it with pd.read_fwf
:
import pandas as pd
df = pd.read_fwf('data', colspecs='infer', header=None, parse_dates=[[0]])
print(df)
yields
0 1 2 3 4 5 6 7 8
0 2012-01-27 16:51:26 19 42 54 91 113 147 188 284
1 2012-01-27 16:51:27 19 42 54 91 113 147 188 284
...
11 2012-01-27 16:51:37 19 42 54 0 NaN 0 2 25
12 2012-01-27 16:51:38 19 42 53 0 NaN 0 1 22
13 2012-01-27 16:51:39 19 42 54 0 NaN 0 1 19
[36 rows x 9 columns]
Or, thanks to DSM, using np.genfromtxt
you can parse fixed-width data by passing a list of widths to the delimiter
parameter:
import numpy as np
np.set_printoptions(formatter={'float':'{:g}'.format})
arr = np.genfromtxt('data', delimiter=[18]+[7]*8)
print(arr)
yields
[[2.01201e+13 19 42 54 91 113 147 188 284]
[2.01201e+13 19 42 54 91 113 147 188 284]
[2.01201e+13 19 42 54 90 113 147 188 284]
...
[2.01201e+13 19 42 54 0 nan 0 2 25]
[2.01201e+13 19 42 53 0 nan 0 1 22]
[2.01201e+13 19 42 54 0 nan 0 1 19]
...]