This is a snippet of my dataframe:
species bill_length_mm bill_depth_mm flipper_length_mm body_mass_g predicted_species
0 Adelie 18 18 181 3750 Chinstrap
1 Adelie 17 17 186 3800 Adelie
2 Adelie 18 18 195 3250 Gentoo
3 Adelie 0 0 0 0 Adelie
4 Chinstrap 19 19 193 3450 Chinstrap
5 Chinstrap 20 20 190 3650 Gentoo
6 Chinstrap 17 17 181 3625 Adelie
7 Gentoo 19 19 195 4675 Chinstrap
8 Gentoo 18 18 193 3475 Gentoo
9 Gentoo 20 20 190 4250 Gentoo
I want to make a biplot for my data, which would be something like this:
But I want to make a biplot for every species
vs predicted_species
matrix, so 9 subplots,same as above, I am not sure how that can be achieved. One way could be to split into dataframes, and make a biplot for each, but that isn't very efficient and difficult for comparison.
Can anyone provide some suggestions on how this could be done?
Combining the answer by Qiyun Zhu on how to plot a biplot with my answer on how to split the plot into the true vs. predicted subsets, you could do it like this:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
# Load iris data.
iris = sns.load_dataset('iris')
X = iris.iloc[:, :4].values
y = iris.iloc[:, 4].values
features = iris.columns[:4]
targets = ['setosa', 'versicolor', 'virginica']
# Mock up some predictions.
iris['species_pred'] = (40 * ['setosa'] + 5 * ['versicolor'] + 5 * ['virginica']
+ 40 * ['versicolor'] + 5 * ['setosa'] + 5 * ['virginica']
+ 40 * ['virginica'] + 5 * ['versicolor'] + 5 * ['setosa'])
# Reduce features to two dimensions.
X_scaled = StandardScaler().fit_transform(X)
pca = PCA(n_components=2).fit(X_scaled)
X_reduced = pca.transform(X_scaled)
iris[['pc1', 'pc2']] = X_reduced
def biplot(x, y, data=None, **kwargs):
# Plot data points.
sns.scatterplot(data=data, x=x, y=y, **kwargs)
# Calculate arrow parameters.
loadings = pca.components_[:2].T
pvars = pca.explained_variance_ratio_[:2] * 100
arrows = loadings * np.ptp(X_reduced, axis=0)
width = -0.0075 * np.min([np.subtract(*plt.xlim()), np.subtract(*plt.ylim())])
# Plot arrows.
horizontal_alignment = ['right', 'left', 'right', 'right']
vertical_alignment = ['bottom', 'top', 'top', 'bottom']
for (i, arrow), ha, va in zip(enumerate(arrows),
horizontal_alignment, vertical_alignment):
plt.arrow(0, 0, *arrow, color='k', alpha=0.5, width=width, ec='none',
length_includes_head=True)
plt.text(*(arrow * 1.05), features[i], ha=ha, va=va,
fontsize='small', color='gray')
# Plot small multiples, corresponding to confusion matrix.
sns.set()
g = sns.FacetGrid(iris, row='species', col='species_pred',
hue='species', margin_titles=True)
g.map(biplot, 'pc1', 'pc2')
plt.show()