pythondataframemissing-datapython-polars

How do you fill missing dates in a Polars dataframe (python)?


I do not seem to find an equivalent for Polars library. But basically, what I want to do is fill missing dates between two dates for a big dataframe. It has to be Polars because of the size of the data (> 100 mill).

Below is the code I use for Pandas, but how can I do the same thing for Polars?

import janitor
import pandas as pd
from datetime import datetime, timedelta


def missing_date_filler(d):
    
    
    df = d.copy()

    
    time_back = 1 # Look back in days
    td = pd.to_datetime(datetime.now().strftime("%Y-%m-%d"))
    helper = timedelta(days=time_back)
    
    max_date = (td - helper).strftime("%Y-%m-%d") # Takes todays date minus 1 day
    
    df_date = dict(Date = pd.date_range(df.Date.min(), 
                                        max_date, 
                                        freq='1D')) # Adds the full date range between the earliest date up until yesterday

    df =  df.complete(['Col_A', 'Col_B'], 
                      df_date).sort_values("Date") # Filling the missing dates
    
    
    return df

Solution

  • It sounds like you're looking for .upsample()

    Note that you can use the group_by parameter to perform the operation on a per-group basis.

    import polars as pl
    from datetime import datetime
    
    df = pl.DataFrame({
       "date": [datetime(2023, 1, 2), datetime(2023, 1, 5)], 
       "value": [1, 2]
    })
    
    shape: (2, 2)
    ┌─────────────────────┬───────┐
    │ date                | value │
    │ ---                 | ---   │
    │ datetime[μs]        | i64   │
    ╞═════════════════════╪═══════╡
    │ 2023-01-02 00:00:00 | 1     │
    │ 2023-01-05 00:00:00 | 2     │
    └─────────────────────┴───────┘
    
    >>> df.upsample(time_column="date", every="1d")
    shape: (4, 2)
    ┌─────────────────────┬───────┐
    │ date                | value │
    │ ---                 | ---   │
    │ datetime[μs]        | i64   │
    ╞═════════════════════╪═══════╡
    │ 2023-01-02 00:00:00 | 1     │
    │ 2023-01-03 00:00:00 | null  │
    │ 2023-01-04 00:00:00 | null  │
    │ 2023-01-05 00:00:00 | 2     │
    └─────────────────────┴───────┘