I'm applying multiple Beta, Gamma and HalfNorm Transforms to each column of my pandas dataframe. The dataframe consists of marketing spend; each row indicates spend per week and each column indicates type of spend:
The python functions and code to apply the transform is as follows:
def geometric_adstock_tt(
x, alpha=0, L=12, normalize=True
): # 12 (days) is the delay or lag we expect to see?
"""
The term "geometric" refers to the way weights are assigned to past values,
which follows a geometric progression.
In a geometric progression,
each term is found by multiplying the previous term by a fixed, constant ratio (commonly denoted as "r").
In the case of the geometric adstock function, the "alpha" parameter serves as this constant ratio.
"""
# vector of weights assigned by decay rate alpha set to be 12 weeks
w = np.array([alpha**i for i in range(L)])
xx = np.stack(
[np.concatenate([np.zeros(i), x[: x.shape[0] - i]]) for i in range(L)]
)
if not normalize:
y = np.dot(w, xx)
else:
y = np.dot(
w / np.sum(w), xx
) # dot product to get marketing channel over time frame of decay
return y
### non-linear saturation function
def logistic_function(x_t, mu=0.1):
# apply the logistic function to spend variable
return (1 - np.exp(-mu * x_t)) / (1 * np.exp(-mu * x_t))
#################
response_mean = []
# Create Distributions
halfnorm_dist = st.halfnorm(loc=0, scale=5)
# Create a beta distribution
beta_dist = st.beta(a=3, b=3)
# Create a gamma distribution
gamma_dist = st.gamma(a=3)
delay_channels = [
'TV', 'Referral', 'DirectMail', 'TradeShows', 'SocialMedia','DisplayAds_Standard', 'ContentMarketing',
'GoogleAds', 'SEO', 'Email', 'AffiliateMarketing',
]
non_lin_channels = ["DisplayAds_Programmatic"]
################ ADSTOCK CHANNELS
for channel_name in delay_channels:
xx = df_in[channel_name].values
print(f"Adding Delayed Channels: {channel_name}")
# apply beta transform
y = beta_dist.pdf(xx)
# apply geometric adstock transform
geo_transform = geometric_adstock_tt(y)
# apply gamma transform
z = gamma_dist.pdf(geo_transform)
# apply logistic function transform
log_transform = logistic_function(z)
# apply halfnorm transform
output = halfnorm_dist.pdf(geo_transform)
# append output
response_mean.append(list(output))
################# SATURATION ONLY
for channel_name in non_lin_channels:
xx = df_in[channel_name].values
# apply gamma transform
z = gamma_dist.pdf(xx)
# apply logistic function transform
log_transform = logistic_function(z)
# apply halfnorm transform
output = halfnorm_dist.pdf(log_transform)
# append output
response_mean.append(list(output))
I'm not quite understanding why all values are being transformed to the same value. I would be so appreciative of any insight! Thanks so much:)
I believe what's happening is that the beta distribution you defined expects your data to be in the range 0 ≤ x ≤ 1 (see the notes for the beta distribution documentation), and anything outside of this range will have a pdf value of 0.
So one possibility is to first min-max scale all of your columns to be in the range 0-1 using the following:
df_in = (df_in-df_in.min())/(df_in.max()-df_in.min())
Using some made up data:
delay_channels = [
'TV', 'Referral', 'DirectMail', 'TradeShows', 'SocialMedia','DisplayAds_Standard', 'ContentMarketing',
'GoogleAds', 'SEO', 'Email', 'AffiliateMarketing',
]
non_lin_channels = ["DisplayAds_Programmatic"]
sample_dates = pd.date_range('2023-01-01','2024-01-01',freq='7D')
sample_data_dict = {
channel: 1000 + 100*np.random.rand(53) for channel in delay_channels+non_lin_channels
}
sample_data_dict['Date'] = sample_dates
np.random.seed(42)
df_in = pd.DataFrame(sample_data_dict)
df_in = df_in.set_index('Date')
df_in = (df_in-df_in.min())/(df_in.max()-df_in.min())
After applying your transformations, I get the following: