We have run an AB test at firebase which has the following results:
I was also building my own Bayesian AB-test suite and was wondering how they came to these conclusions.
What I was doing was querying the data of this test for the Control Group and Variant C:
I based my algorithm on this tool: https://vidogreg.shinyapps.io/bayes-arpu-test/. When I enter these inputs I get the following result:
This tool seems to be much more condident that Variant C is better than the control group then Firebase. It also seems like the Firebase distributions for Revenue per user are skewed while the Bayesian ARPU tool has very symmetrical distribution.
The code for the Bayesian ARPU tool is available. They used conjugate priors to get to these conclusions based on this paper:
https://cdn2.hubspot.net/hubfs/310840/VWO_SmartStats_technical_whitepaper.pdf
Can anyone help me out which results are the best?
I have found out what my problem was.
The first problem is that it has to be broken into two steps. As it is freemium app, most user do not pay. This means that these users do not give extra information for the distribution.
So, We first need to find posterior distribitions for the payer percentage. This can be done as explained the paper I mentioned. In Python a function for the posterior distribition is this:
def binomial_rvar(successs, samples):
rvar = np.random.beta(1 + successes, 1 + (total - successes), samples)
return rvar
Secondly, of all payers, we want to get the revenue. The paper also describes how to do revenue, but they assume the revenue is exponentially distributed. This is not the case for our app. We have some users that spend insane amount of money on this app. If this user were to be in one of the groups, this method will immediately think it is the best.
What we can do is take the log of the pareto distributed samples, which will transform a pareto distbution into a exponential distribution. We first take the log of the user revenue and then sum all these together creating the "logsum" and count from how many users it came. We can then use the same approach as the paper uses. In Python this would be something like this:
def get_exponential_rvars(total_sum, users, samples):
r_var = 1. / np.random.gamma(users, 1 / (1 + total_sum), samples)
return r_var
We can now multiply both these r_var results, giving the final distribution for the revenue per user.