I have a commercial performance dataset such as the following:
Client Type | Channel | %growth_vol |
---|---|---|
Big Retail | A | 9% |
Big Retail | B | 7% |
Mid Retail | A | 11% |
Mid Retail | B | 18% |
Small Retail | A | 21% |
Small Retail | B | 16% |
I am measuring volume growth for group of clients over a period of time. The only effect difference is through which distribution channel (A,B) they reach the market. The clients between each cluster are different (a big retailer either goes to market via A or B, never switching) and quite homogeneous within clusters. The table above is just a summary. I do have the full blown dataset with 2000+ clients and their individual respective growths, clusters, channels, etc. My goal is to establish if there are significant differences in growth rate between channels given a client type, i.e., if channel choice has a bearing in performance. For example, is 9% significantly different to 7% for big retailers.
My initial take was a Two-Sample T-Test (independent samples) taking care that the data groups have equal variance and adjusting accordingly (if yes, using the t-test straight; if not, a Welch’s t-test). As a side note, I'm using python's Statsmodels
.
I am currently unsure because I've always used the t-test for absolute attributes such as weight, size, speed etc. The fact that I am exploring growth rates now certainly makes me a bit uneasy about its correct usage.
Am I correct in using a t-test? is there a better/correct test?
Yes, that is what I would do. I would not check for equality of variances though, since this is a bit of an overkill. I would use Welch's t-test for everything.
I would, though, first look at the distributions per factor (channel, in your case). If they look normal by eye, use the above t-test. Otherwise, use Mann–Whitney U test.
If you want to be really careful, test for normality in addition to estimating by eye. There are plenty of normality tests, see Normality test. I usually apply Shapiro–Wilk test, but YMMV.