I have a numpy array with 42000 (rows) * 110000 (dimensions) ,I am trying to create a pairwise distance matrix(42000*42000) with 32GB ram and 8 cores.
I tried pairwise_distances_chunked but it is only giving 3120*42000 distance matrix .Also used pairwise_distances but it is giving out of memory error.
Any suggestions what can be done?
Reading the documentation for pairwise_distances_chunked, it yields a chunk at a time. Based on the way you phrased your question, it seems like you did this:
D_chunk = next(pairwise_distances_chunked(X))
That code (which is the first example from the documentation) only gives you the first chunk.
What you want to do is this:
for chunk in pairwise_distances_chunked(X):
do_something(chunk)