Can t-test be calculated on large samples with non-normal distribution?
For example, the number of users in group A is 100K, the number of users in group B is 100K. I want to test whether the average session duration of these two groups is statistically significant.
1st method) We calculated the average session duration of these users on the day after the AB test (DAY1) as
We know that users in groups A and B have a non-normal distribution of DAY1 session values. In such a case, would it be correct to use two samples t-test to test the DAY1 avg session durations of two groups? (We will accept n=100K) (Some sources say that calculating t-scores for large samples will give accurate results even with non-normal distribution.)
2nd method) Would it be a correct method to calculate the t-score over the daily average session duration during the day the AB test is open? E.g; In the scenario below, the average daily session duration of 100K users in groups A and B are calculated. We will accept the number of days here as the number of observations and get n=30. We will also calculate the two-sample t-test calculation over n=30.
Group | day0 avg duration | day1 avg duration | day2 avg duration | ... | day30 av gduration |
---|---|---|---|---|---|
A | 30.2 | 31.2 | 32.4 | ... | 33.2 |
B | 29.1 | 30.2 | 30.4 | ... | 30.1 |
Do these methods give correct results or is it necessary to apply another method in such scenarios? Would it make sense to calculate t-test on large samples in AB test?
The t-test assumes that the means of different samples taken from a population are normally distributed. It doesn't assume that the population itself is normally distributed.
For a population with finite variance, the central limit theorem suggests that the means of samples from the population are normally distributed. However, the sample size needed for the distribution of means to be approximately normal depends on the degree of non-normalness of the population. The t-test is invalid for small samples from non-normal population distributions, but is valid for large samples from non-normal distributions.
Method 1 works because of this reason (large sample size ~100K) and you are correct that calculating t-scores for large samples will give accurate results even with non-normal distribution. [You may also consider using a z-test for the sample sizes you're working with (100K). T-tests are more appropriate for smaller sample sizes, such as n < 30]
Method 2 works because the daily averages should be normally distributed given enough samples per the central limit theorem. Time-spent datasets may be skewed but generally work well.