I'm trying to write a WHERE clause that will filter data differently for respective entities, based on time frames that are unique to each entity, like so:
import pandas as pd
product_dates = [
{'name': 'product_1', 'start_date': '2024-01-01', 'end_date': '2024-05-01'},
{'name': 'product_2', 'start_date': '2024-03-05', 'end_date': '2024-04-26'},
{'name': 'product_3', 'start_date': '2024-02-09', 'end_date': '2024-11-08'}
]
df_product = pd.DataFrame(product_dates)
What I'd like to see for the output is the number of sales from each product that are between the given dates, like so:
output_data = [
{'name': 'product_1', 'time_frame': '2024-01-01 to 2024-05-01', 'sales': 204},
{'name': 'product_2', 'time_frame': '2024-03-05 to 2024-04-26', 'sales': 1890},
{'name': 'product_3', 'time_frame': '2024-02-09 to 2024-11-08', 'sales': 766}
]
df_output = pd.DataFrame(output_data)
And to get that, there's a table of product sales structured like this:
sales_data = [
{'name': 'product_1', 'date': '2024-01-01', 'sales': 1},
{'name': 'product_2', 'date': '2024-01-01', 'sales': 3},
{'name': 'product_3', 'date': '2024-01-01', 'sales': 4},
...
]
df_sales = pd.DataFrame(sales_data)
Theoretically the SQL clause would be:
SELECT name, time_frame, SUM(quantity) AS sales
FROM sales_data
WHERE date BETWEEN start_date AND end_date
GROUP BY ALL
But I'm not sure how to have the WHERE clause reflect the fact that the dates are variable by product.
Any tips? Thank you!
As per Dale K.s comment, In SQL you can use a JOIN for the sales_data and product_dates, this should create a link that you can use to filter the sales to a start date and end date then use SUM to add up the amount sold in that date range.
It could look something like this:
SELECT
s.name,
CONCAT(p.start_date, ' to ', p.end_date) AS time_frame,
SUM(s.sales) AS sales
FROM sales_data s
JOIN product_dates p ON s.name = p.name
WHERE s.date BETWEEN p.start_date AND p.end_date
GROUP BY s.name, p.start_date, p.end_date;