I have two dictionaries that contains two discrete distribution: A={1: 300, 2: 400, 4: 20,...}
and B={2: 100, 3: 200 , 4: 75,...}
. I want to check how much symilar they are and I thought of performing two-sample Kolmogorov-Smirnov test.
I checked the scipy function but it seems to work only on numpy array, how I could perform it on my data?
You can transform your data into numpy.array
easily:
import numpy as np
my_keys = sorted(set([*A.keys(), *B.keys()]))
A_array = np.array(A.get(key,0) for key in my_keys)
B_array = np.array(B.get(key,0) for key in my_keys)
I noticed that A
and B
do not have the same keys (for example, B
does not seem to contain key "1") - so you need to pay attention to that. Reason why I found the union of the keys, and imposed a value of 0 if key does not exist in the dictionary (I assume that, in that case, you do not have any observation for that specific key).
Now the two arrays are compatible for the test.