I want to plot a heatmap centered around a value of 1 showing the expression of genes corresponding to cell lines using this as code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import TwoSlopeNorm
import seaborn as sns
data = pd.read_csv(r"C:\Users\Hbeen\Desktop\aGPCR Pancreas.csv", header=1, index_col=0)
fig, ax = plt.subplots()
rdgn = sns.diverging_palette(h_neg=130, h_pos=10, s=99, l=55, sep=3, as_cmap=True)
divnorm = TwoSlopeNorm (vmin=data.min(), vcenter=1.00, vmax=data.max())
sns.heatmap(data, cmap='coolwarm', norm=divnorm, fmt ='.0%',
linewidths=0.5, linecolor='black', cbar=True, ax=ax)
plt.title("Expression von AGPCRs in Pankreas Zelllinien")
plt.xlabel("Gene")
plt.ylabel("Zelllinien")
plt.xticks(rotation=90, fontsize=6)
plt.yticks(fontsize=6)
plt.show()
Everytime I run it it throws a ValueError
which I don't know how to fix as I am completly new to coding. Seeing as I have no idea why this error pops up I have no idea how to even begin to fix it. I tried looking for answers on this site but none included pandas in the problem so they weren't that helpful for me.
runfile('C:/Users/Hbeen/Desktop/unbenannt0.py', wdir='C:/Users/Hbeen/Desktop')
Traceback (most recent call last):
File ~\anaconda3\Lib\site-packages\spyder_kernels\py3compat.py:356 in compat_exec
exec(code, globals, locals)
File c:\users\hbeen\desktop\unbenannt0.py:17
divnorm = TwoSlopeNorm (vmin=data.min(), vcenter=1.00, vmax=data.max())
File ~\anaconda3\Lib\site-packages\matplotlib\colors.py:1493 in __init__
super().__init__(vmin=vmin, vmax=vmax)
File ~\anaconda3\Lib\site-packages\matplotlib\colors.py:1279 in __init__
self._vmin = _sanitize_extrema(vmin)
File ~\anaconda3\Lib\site-packages\matplotlib\colors.py:208 in _sanitize_extrema
ret = ex.item()
File ~\anaconda3\Lib\site-packages\pandas\core\base.py:418 in item
raise ValueError("can only convert an array of size 1 to a Python scalar")
ValueError: can only convert an array of size 1 to a Python scalar
This is the error I get, apparently it happens when it is trying to run
TwoSlopeNorm (vmin=data.min(), vcenter=1.00, vmx=data.max()
The error indicates that when you call data.min(), and data.max(), it returns objects of pandas.Series, and not scalar values. Because these series passed to the TwoSlopeNorm expect a vmin and vmax to be given a single value but are in fact arrays (series) with their contents of several values.
To fix that, you might use.min().min() and .max().max() and would be sure that you are taking the minimum or maximum value of the whole DataFrame. That means these extract scalar values of the min and max of all cells in the table rather than series of these values.