Problem: Creation of Pearson correlation coeffizient dependant on values of third column.
To start with, I have a dataframe with 3 columns. A, B and C
Col. A and B contain float64 type whereas in C there are objects. I want to get the Pearson correlation coefficient for col A and B.
print(df['A'].corr(df['B'],method='pearson')) --> This works fine for the whole columns.
In the next step I struggle. Column C has got only 2 values. Let's call them c1 and c2. I now want to get the coefficients each for c1 and c2. I tried with
print(df['A']&df['C']=='c1').corr((df['B']&df['C']=='c1'),method='pearson')
and for c2 the same way. The documented error is: TypeError: unsupported operand type(s) for &: 'float' and 'str' How can I get both coefficients without splitting the dataframe?
Thanks in advance
This should achieve what you're looking for:
print(df[df['C']=='c1']['A'].corr(df[df['C']=='c1']['B'],method='pearson'))
df[df['C']=='c1']
retrieves the subset of the dataframe where the value in column C is 'c1', and then you just call the column you want as usual.