I am webscraping some data from a few websites, and using pandas to modify it.
On the first few chunks of data it worked well, but later I get this error message:
Traceback(most recent call last):
File "/home/web/.local/lib/python2.7/site-packages/pandas/core/frame.py, line 2326, in __setitem__ self._setitem_array(key,value)
File "/home/web/.local/lib/python2.7/site-packages/pandas/core/frame.py, line 2350, in _setitem_array
raise ValueError("Columns must be same length as key') ValueError: Columns must be same length as key
My code is here:
df2 = pd.DataFrame(datatable, columns = cols)
df2[['STATUS_ID_1','STATUS_ID_2']] = df2['STATUS'].str.split(n=1, expand=True)
My data looks like below:
STATUS
2 Landed 8:33 AM
3 Landed 9:37 AM
.. ... ...
316 Delayed 5:00 PM
341 Delayed 4:32 PM
.. ... ...
397 Delayed 5:23 PM
.. ... ...
[240 rows x 2 columns]
You need a bit modify solution, because sometimes it return 2 and sometimes only one column:
df2 = pd.DataFrame({'STATUS':['Estimated 3:17 PM','Delayed 3:00 PM']})
df3 = df2['STATUS'].str.split(n=1, expand=True)
df3.columns = ['STATUS_ID{}'.format(x+1) for x in df3.columns]
print (df3)
STATUS_ID1 STATUS_ID2
0 Estimated 3:17 PM
1 Delayed 3:00 PM
df2 = df2.join(df3)
print (df2)
STATUS STATUS_ID1 STATUS_ID2
0 Estimated 3:17 PM Estimated 3:17 PM
1 Delayed 3:00 PM Delayed 3:00 PM
Another possible data - all data have no whitespaces and solution working too:
df2 = pd.DataFrame({'STATUS':['Canceled','Canceled']})
and solution return:
print (df2)
STATUS STATUS_ID1
0 Canceled Canceled
1 Canceled Canceled
All together:
df3 = df2['STATUS'].str.split(n=1, expand=True)
df3.columns = ['STATUS_ID{}'.format(x+1) for x in df3.columns]
df2 = df2.join(df3)