pythonmatplotliblegend

Scatter plot legend with colors for a string attribute in complex dataframe using Matplotlib (Python)


I have a dataframe with thousands of depth values (y) and associated porosity values (x) and layer name (layer). I've created a scatter plot of the y and x values and want to color the symbols by layer. A simple example below:

#Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

#Create Data
data = {'y': [1,2,3,4,5,6,7,8,9,10],
        'x': [1,2,3,4,5,6,7,8,9,10],
        'layer' : ['a','a','a','b','b','b','c','c','c','c']}
df=pd.DataFrame(data)

print(df)

#Plot Data
plt.scatter(df['x'],df['y'])
plt.show()
y   x layer

0 1 1 a 1 2 2 a 2 3 3 a 3 4 4 b 4 5 5 b 5 6 6 b 6 7 7 c 7 8 8 c 8 9 9 c 9 10 10 c

enter image description here

I need a simple way to color the symbols by layer and create a legend.

I was able to color the symbols using multiple methods, e.g., adding a color attribute to the dataframe by layer. However, I have failed to create a good legend for the color symbols of layer name. Any suggestions would be appreciated.


Solution

  • I'll post here some approaches that you can use to do this.

    We start by defining our dataframe:

    data = {'y': [1,2,3,4,5,6,7,8,9,10],
            'x': [1,2,3,4,5,6,7,8,9,10],
            'layer' : ['a','a','a','b','b','b','c','c','c','c']}
    df = pd.DataFrame(data)
    

    Matplotlib approach

    You have the data, get subsets of the dataframe and plot that with a label.

    Code looks like this:

    layerColours = {'a': 'blue', 'b': 'green', 'c': 'red'} # map the letters to a color, can be automated
    plt.figure()
    for layer in df['layer'].unique(): # group by layer
        subDf = df[df['layer'] == layer] # get subset of dataframe
        plt.scatter(subDf['x'], subDf['y'], label=layer, color=layerColours[layer]) # plot
    # add labels and such
    plt.xlabel('Porosity (x)')
    plt.ylabel('Depth (y)')
    plt.title('Scatter Plot Colored by Layer with matplotlib')
    plt.legend(title='Layer')
    plt.grid()
    

    Plot looks like this:

    grouped matplotlib plot


    With Seaborn

    Seaborn offers you the ability to plot data in a dataframe with in a very convinient way. For this, you can use this:

    plt.figure()
    sns.scatterplot(data=df, x='x', y='y', hue='layer', s=100)
    plt.xlabel('Porosity (x)')
    plt.ylabel('Depth (y)')
    plt.title('Scatter Plot Colored by Layer')
    plt.legend(title='Layer')
    plt.grid()
    

    The results:

    seaborn grouped plot


    With Pandas

    Techinically, the following lines should work:

    plt.figure()
    df.groupby("layer").plot(x='x', y='y', ax= plt.gca(), kind='scatter')
    

    However, this does not change the colour of the scattered points:

    scatter

    Using kind="line" automatically changes the colour:

    lines

    Am not sure if this can be done in pandas, at least with the version that I have...

    Might update this part of the answer later...