pythonpython-3.xpandaspandasql

How can one extract date features from a date in pandasql?


I need to extract date features (Day, Week, Month, Year) from a date column of a pandas data frame, using pandasql. I can't seem to locate what version of SQL pandasql is using so I am not sure how to accomplish this feat. Has anyone else tried something similar?

Here is what I have so far:

#import the needed libraries
import numpy as np
import pandas as pd
import pandasql as psql

#establish dataset
doc = 'room_data.csv'
df = pd.read_csv(doc)
df.head()

df2 = psql.sqldf('''
SELECT
    Timestamp
    , EXTRACT (DAY FROM "Timestamp") AS Day --DOES NOT WORK IN THIS VERSION OF SQL
    , Temperature
    , Humidity
    
FROM df
''')
df2.head()

Data Frame Example:

enter image description here


Solution

  • As far as I know , SQLite does not support EXTRACT() function.

    You can try strftime('%d', Timestamp)


    psql.sqldf('''SELECT
    
      Timestamp
    , strftime('%d', Timestamp) AS Day 
    , Temperature
    , Humidity
    
     FROM df
     ''')
    

    Consider the below example which demonstrates the above query:

    Example dataframe:

    np.random.seed(123)
    dates = pd.date_range('01-01-2020','01-05-2020',freq='H')
    temp = np.random.randint(0,100,97)
    humidity = np.random.randint(20,100,97)
    df = pd.DataFrame({"Timestamp":dates,"Temperature":temp,"Humidity":humidity})
    print(df.head())
    
                Timestamp  Temperature  Humidity
    0 2020-01-01 00:00:00           66        29
    1 2020-01-01 01:00:00           92        43
    2 2020-01-01 02:00:00           98        34
    3 2020-01-01 03:00:00           17        58
    4 2020-01-01 04:00:00           83        39
    

    Working Query:

    import pandasql as ps
    query = '''SELECT
          Timestamp
        , strftime('%d', Timestamp) AS Day 
        , Temperature
        , Humidity
        FROM df'''
    print(ps.sqldf(query).head())
    
                        Timestamp Day  Temperature  Humidity
    0  2020-01-01 00:00:00.000000  01           66        29
    1  2020-01-01 01:00:00.000000  01           92        43
    2  2020-01-01 02:00:00.000000  01           98        34
    3  2020-01-01 03:00:00.000000  01           17        58
    4  2020-01-01 04:00:00.000000  01           83        39
    

    you can get more details here to get more date extract functions, common ones are shown below:


    import pandasql as ps
    query = '''SELECT
          Timestamp
        , strftime('%d', Timestamp) AS Day 
        ,strftime('%m', Timestamp) AS Month 
        ,strftime('%Y', Timestamp) AS Year 
        ,strftime('%H', Timestamp) AS Hour 
        , Temperature
        , Humidity
        FROM df'''
    print(ps.sqldf(query).head())
    
                        Timestamp Day Month  Year Hour  Temperature  Humidity
    0  2020-01-01 00:00:00.000000  01    01  2020   00           66        29
    1  2020-01-01 01:00:00.000000  01    01  2020   01           92        34
    2  2020-01-01 02:00:00.000000  01    01  2020   02           98        90
    3  2020-01-01 03:00:00.000000  01    01  2020   03           17        32
    4  2020-01-01 04:00:00.000000  01    01  2020   04           83        74