pythonpandasdataframeseabornvisualization

Visualizing Relationships Between Heterogeneous Data Variables in a Pandas DataFrame


I have a pandas dataframe, which contains heterogeneous data variables (strings and numberical values).

Is there a quick way of visualising the relationships between these variables in a plot, where each column and row in the plot would correspond to the individual data variable?

I have tried using sns.pairplot(df), but it ignores the string variables.


Solution

  • When it comes to the categorical data, you need to encode them. As for the visualization, you can either do scatterplots of correlation heatmaps. Here is an example with fictive data that you could adapt to fit your needs:

    import pandas as pd
    import numpy as np
    import seaborn as sns
    import matplotlib.pyplot as plt
    from sklearn.preprocessing import LabelEncoder
    
    np.random.seed(42)
    df = pd.DataFrame({
        'Category': np.random.choice(['A', 'B', 'C'], size=100),
        'Numerical1': np.random.randn(100) * 10 + 50,
        'Numerical2': np.random.randn(100) * 5 + 100,
        'String_Var': np.random.choice(['Low', 'Medium', 'High'], size=100),
        'Numerical3': np.random.randint(1, 100, size=100)
    })
    
    label_encoders = {}
    for col in ['Category', 'String_Var']:
        le = LabelEncoder()
        df[col + '_Encoded'] = le.fit_transform(df[col])
        label_encoders[col] = le
    
    df_encoded = df.drop(columns=['Category', 'String_Var'])
    
    sns.pairplot(df_encoded, diag_kind='kde')
    plt.show()
    
    plt.figure(figsize=(8,6))
    sns.heatmap(df_encoded.corr(), annot=True, cmap='coolwarm', fmt=".2f", linewidths=0.5)
    plt.title("Correlation Heatmap")
    plt.show()
    

    which renders

    scatterplot

    and

    heatmap