[SOLVED] Perform two-sample Kolmogorov-Smirnov test on two dictionaries

Perform two-sample Kolmogorov-Smirnov test on two dictionaries

I have two dictionaries that contains two discrete distribution: A={1: 300, 2: 400, 4: 20,...} and B={2: 100, 3: 200 , 4: 75,...}. I want to check how much symilar they are and I thought of performing two-sample Kolmogorov-Smirnov test.

I checked the scipy function but it seems to work only on numpy array, how I could perform it on my data?

Solution

You can transform your data into numpy.array easily:

import numpy as np

my_keys = sorted(set([*A.keys(), *B.keys()]))

A_array = np.array(A.get(key,0) for key in my_keys)
B_array = np.array(B.get(key,0) for key in my_keys)

I noticed that A and B do not have the same keys (for example, B does not seem to contain key "1") - so you need to pay attention to that. Reason why I found the union of the keys, and imposed a value of 0 if key does not exist in the dictionary (I assume that, in that case, you do not have any observation for that specific key).

Now the two arrays are compatible for the test.