statisticsp-valueab-testing

p-value borderline significant? Further study?


I am learning about AB testing and have run into some questions.

In the events of borderline significant p-value, say p = 0.049 and p = 0.051, is it really that different?

In the events of that I have a p-value of 0.051, what should I do? Gather further info would be expensive, but I'm also hesitating to accept null.

Also, say that if I do a further research on subset of the data with one feature (i.e, if I got p = 0.051 for a general study, and then divide the data into sports/movies/books, and found a p_sports = 0.01, p_movies = 0.07, p_books = 0.055), can I conclude that sports category is statistical significant?

Thanks!


Solution

  • In any case, you must have in mind that testing each hypothesis has a price, and if you are testing multiple hypothesis, you must be aware of inflation of type I error that can happen ( https://en.wikipedia.org/wiki/Multiple_comparisons_problem ) What are you suggesting is not the best way; any tests after the results are known fall under the category of post-hoc analysis (https://en.wikipedia.org/wiki/Post_hoc_analysis ). In any case, these practices should be planned before observing the data, otherwise it is just a blind chase. If you have this specific situation, you should state your null hypothesis and the alpha level (under which you start rejecting the null). If the test gives you 0.051 and you said alpha 0.05, then you should not reject the null (software does that). Also, be sure that that is your final answer (check missing cases etc etc). After this, you always have the discussion to elaborate why you got your results, even present findings from your post-hoc analyses and ask questions. If you have post-hoc analyses with significant result, just state it, relate it to findings from previous research. If these results mean something, then any next research effort with the proper design should answer that particular question.