I have a dataframe with thousands of depth values (y) and associated porosity values (x) and layer name (layer). I've created a scatter plot of the y and x values and want to color the symbols by layer. A simple example below:
#Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#Create Data
data = {'y': [1,2,3,4,5,6,7,8,9,10],
'x': [1,2,3,4,5,6,7,8,9,10],
'layer' : ['a','a','a','b','b','b','c','c','c','c']}
df=pd.DataFrame(data)
print(df)
#Plot Data
plt.scatter(df['x'],df['y'])
plt.show()
y x layer
0 1 1 a 1 2 2 a 2 3 3 a 3 4 4 b 4 5 5 b 5 6 6 b 6 7 7 c 7 8 8 c 8 9 9 c 9 10 10 c
I need a simple way to color the symbols by layer and create a legend.
I was able to color the symbols using multiple methods, e.g., adding a color attribute to the dataframe by layer. However, I have failed to create a good legend for the color symbols of layer name. Any suggestions would be appreciated.
I'll post here some approaches that you can use to do this.
We start by defining our dataframe:
data = {'y': [1,2,3,4,5,6,7,8,9,10],
'x': [1,2,3,4,5,6,7,8,9,10],
'layer' : ['a','a','a','b','b','b','c','c','c','c']}
df = pd.DataFrame(data)
You have the data, get subsets of the dataframe and plot that with a label.
Code looks like this:
layerColours = {'a': 'blue', 'b': 'green', 'c': 'red'} # map the letters to a color, can be automated
plt.figure()
for layer in df['layer'].unique(): # group by layer
subDf = df[df['layer'] == layer] # get subset of dataframe
plt.scatter(subDf['x'], subDf['y'], label=layer, color=layerColours[layer]) # plot
# add labels and such
plt.xlabel('Porosity (x)')
plt.ylabel('Depth (y)')
plt.title('Scatter Plot Colored by Layer with matplotlib')
plt.legend(title='Layer')
plt.grid()
Plot looks like this:
Seaborn offers you the ability to plot data in a dataframe with in a very convinient way. For this, you can use this:
plt.figure()
sns.scatterplot(data=df, x='x', y='y', hue='layer', s=100)
plt.xlabel('Porosity (x)')
plt.ylabel('Depth (y)')
plt.title('Scatter Plot Colored by Layer')
plt.legend(title='Layer')
plt.grid()
The results:
Techinically, the following lines should work:
plt.figure()
df.groupby("layer").plot(x='x', y='y', ax= plt.gca(), kind='scatter')
However, this does not change the colour of the scattered points:
Using kind="line"
automatically changes the colour:
Am not sure if this can be done in pandas, at least with the version that I have...
Might update this part of the answer later...