I have a pandas dataframe, which contains heterogeneous data variables (strings and numberical values).
Is there a quick way of visualising the relationships between these variables in a plot, where each column and row in the plot would correspond to the individual data variable?
I have tried using sns.pairplot(df), but it ignores the string variables.
When it comes to the categorical data, you need to encode them. As for the visualization, you can either do scatterplots of correlation heatmaps. Here is an example with fictive data that you could adapt to fit your needs:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
np.random.seed(42)
df = pd.DataFrame({
'Category': np.random.choice(['A', 'B', 'C'], size=100),
'Numerical1': np.random.randn(100) * 10 + 50,
'Numerical2': np.random.randn(100) * 5 + 100,
'String_Var': np.random.choice(['Low', 'Medium', 'High'], size=100),
'Numerical3': np.random.randint(1, 100, size=100)
})
label_encoders = {}
for col in ['Category', 'String_Var']:
le = LabelEncoder()
df[col + '_Encoded'] = le.fit_transform(df[col])
label_encoders[col] = le
df_encoded = df.drop(columns=['Category', 'String_Var'])
sns.pairplot(df_encoded, diag_kind='kde')
plt.show()
plt.figure(figsize=(8,6))
sns.heatmap(df_encoded.corr(), annot=True, cmap='coolwarm', fmt=".2f", linewidths=0.5)
plt.title("Correlation Heatmap")
plt.show()
which renders
and