pandasdataframeparallel-processingdaskdask-distributed

Multiplying Dask Dataframes results in NaN values


I am using dask.distrinuted, and I have two dask DataFrames A & B. Both have the same number of partitions, and each partition is a 2D DataFrame containing the same columns and rows that have float64 values. When I multiply the dask dataframes A*B and compute the results. I get a dask dataframe of the same size full of NaN values.

I tried computing a single partition of each dataframe individually as in:

A.partitions[1].compute()
B.partitions[1].compute()

And none of the two contain NaN values. I multiplied the two :

A.partitions[1].compute()*B.partitions[1].compute()

and I still get a dataframe of the same size that is full of NaN values. What could the problem be, why aren't I getting the actual results in float64? Note that other multiplicaion operations seem to work fine. Could it be related to the difference graph layers?


Solution

  • The issue was solved by simply equating both columns of the dask data frames:

    A.columns == B.columns 
    

    Even though upon inspection it seemed that the columns and rows had the same names and numbers and type, it seems that there has been an unnoticed discrepancy.