pythonpearson-correlationpearson

Problem with creating Pearson correlation coefficient in python


Problem: Creation of Pearson correlation coeffizient dependant on values of third column.

To start with, I have a dataframe with 3 columns. A, B and C

Col. A and B contain float64 type whereas in C there are objects. I want to get the Pearson correlation coefficient for col A and B.

print(df['A'].corr(df['B'],method='pearson')) --> This works fine for the whole columns.

In the next step I struggle. Column C has got only 2 values. Let's call them c1 and c2. I now want to get the coefficients each for c1 and c2. I tried with

print(df['A']&df['C']=='c1').corr((df['B']&df['C']=='c1'),method='pearson')

and for c2 the same way. The documented error is: TypeError: unsupported operand type(s) for &: 'float' and 'str' How can I get both coefficients without splitting the dataframe?

Thanks in advance


Solution

  • This should achieve what you're looking for:

    print(df[df['C']=='c1']['A'].corr(df[df['C']=='c1']['B'],method='pearson'))
    

    df[df['C']=='c1'] retrieves the subset of the dataframe where the value in column C is 'c1', and then you just call the column you want as usual.