I have the following list:
l = [<div class="date">8 December 2004</div>,
<div class="date">6 December 2004</div>,
<div class="date">18 October 2004</div>,
<div class="date">9 October 2004</div>,
<div class="date">8 August 2004</div>,
<div class="date">18 June 2004</div>,
<div class="date">23 December 2005</div>,
<div class="date">19 December 2005</div>,
<div class="date">19 December 2005</div>,
<div class="date">15 December 2005</div>]
I would like to convert it into a dataframe with a Date
column in a to.datetime
format.
I tried many solutions (see one below) but I couln't get my head around it.
pd.to_datetime(pd.DataFrame({'Date':l}), format = '%d %B %Y')
Can anyone help me?
Thanks!
Extract text inside tags by BeautifulSoup
and then convert to datetimes:
from bs4 import BeautifulSoup
df = pd.DataFrame({'Date':[ BeautifulSoup(x, features="lxml").text for x in l]})
df['Date'] = pd.to_datetime(df['Date'], format = '%d %B %Y')
print (df)
Date
0 2004-12-08
1 2004-12-06
2 2004-10-18
3 2004-10-09
4 2004-08-08
5 2004-06-18
6 2005-12-23
7 2005-12-19
8 2005-12-19
9 2005-12-15