statisticsprobability-densitycdfprobability-distributionkolmogorov-smirnov

Two PDFs (or CDFs) the same if "x" values different


I've been given two sets of data and I need to figure out if they're the "same". Each data set consists of a PDF and a CDF (but not the underlying samples). The PDF and CDF data is in the form of x, y values and they're digitized samples of continuous data. I don't know what the underlying distribution is (I don't know if the underlying distribution is beta, or log-normal etc.) I have data that looks something like:

Set1:

Set2:

Importantly:

Obviously, I thought of a Kolmogorov–Smirnov test, but my reading suggests the test depends on the CDF values 'lining up' on the x axis, which is not my case here. I thought of interpolating my data sets to the same x-axis, but I'm concerned that will introduce fitting errors.

I looked at other non-parametric tests, but it seems to be the same problem everywhere because the x-values don't align.

Are there non-parametric tests that don't rely on CDF/PDF data sampled at the same interval?


Solution

  • I followed the hints from Robert Dodier and did not use KS. I used a polygon representation of both data sets and calculated the area difference. If the difference was greater than a threshold, the test failed. The K-S test was the wrong approach here. Interpolating the results to the same "grid" introduced numerical artifacts that confused the results. Morale of the story: tests have limits.