I'm working with podcast RSS feeds in Python. Are there any existing libraries or algorithms to detect and predict periodic release schedules, given a series in time?
For example, if five items in an RSS feed had the following timestamps:
Fri, 20 Nov 2020 02:16:14 +0000
Fri, 13 Nov 2020 17:51:58 +0000
Fri, 6 Nov 2020 03:08:04 +0000
Fri, 30 Oct 2020 19:09:29 +0000
Fri, 23 Oct 2020 01:23:10 +0000
is there an algorithm to determine "Weekly on Fridays"? Or if they were:
Tue, 24 Nov 2020 10:00:00 -0000
Fri, 20 Nov 2020 09:00:00 -0000
Tue, 17 Nov 2020 10:00:00 -0000
Fri, 13 Nov 2020 10:00:00 -0000
Tue, 10 Nov 2020 10:00:00 -0000
to determine "Twice a week, next episode Friday the 27th"? I believe Pocket Casts has a feature like this, but it remains proprietary.
For easy ones you can use pd.infer_freq
in this way
import numpy as np
import pandas as pd
date_range = [
"Fri, 20 Nov 2020",
"Fri, 13 Nov 2020",
"Fri, 6 Nov 2020",
"Fri, 30 Oct 2020",
"Fri, 23 Oct 2020"]
date_range_2 = [
"Tue, 24 Nov 2020",
"Fri, 20 Nov 2020",
"Tue, 17 Nov 2020",
"Fri, 13 Nov 2020",
"Tue, 10 Nov 2020"]
def get_frequency(date_range):
ts = pd.Series(index=date_range)
return pd.infer_freq(ts.index)
print(f"First Time Series: {get_frequency(date_range)}")
print(f"Second Time Series: {get_frequency(date_range_2)}")
Giving you no output for the second, but for the first one
First Time Series: -1W-FRI
Second Time Series: None