pythonpandasstring-to-datetime

Problems in converting ".to_datetime" in Python


I have the following list:

l = [<div class="date">8 December 2004</div>,
 <div class="date">6 December 2004</div>,
 <div class="date">18 October 2004</div>,
 <div class="date">9 October 2004</div>,
 <div class="date">8 August 2004</div>,
 <div class="date">18 June 2004</div>,
 <div class="date">23 December 2005</div>,
 <div class="date">19 December 2005</div>,
 <div class="date">19 December 2005</div>,
 <div class="date">15 December 2005</div>]

I would like to convert it into a dataframe with a Date column in a to.datetime format.

I tried many solutions (see one below) but I couln't get my head around it.


pd.to_datetime(pd.DataFrame({'Date':l}), format = '%d %B %Y')        

Can anyone help me?

Thanks!


Solution

  • Extract text inside tags by BeautifulSoup and then convert to datetimes:

    from bs4 import BeautifulSoup
    
    df = pd.DataFrame({'Date':[ BeautifulSoup(x, features="lxml").text for x in l]})
    df['Date'] = pd.to_datetime(df['Date'], format = '%d %B %Y')
    print (df)
            Date
    0 2004-12-08
    1 2004-12-06
    2 2004-10-18
    3 2004-10-09
    4 2004-08-08
    5 2004-06-18
    6 2005-12-23
    7 2005-12-19
    8 2005-12-19
    9 2005-12-15