I'm currently working on a binary SVM classifier. To visualize how the classifier works I want to create a histogram of the probability density function (calculated with scikit) that displays the scalar of a single data point (whether it belongs to class 0 or 1).
Note that in SVM, the 'cutting edges' of the classifier are -1 and 1. The graph nicely depicts that there is some decisions boundary at [-1,1].
Back to my problem: I want to color the data points of the labels 0 and 1 separately in order to analyze the soft margin(the area between -1 and 1)
The probability density function is stored as np.array The corresponding labels are stored in a pandas dataframe.
How do I link array and dataframe that the numerical values of category 0 are plotted in i.e. 'green' and category 1 are plotted as i.e.'blue' ?
Code:
plt.hist(decisions_function_cv, bins=500, color='navy')
I tried to save both in the same dataframe but I cannot decrypt how I program it in the intended way stated above into the decisions function :(
Some1 got a smart approach? Thanks in advance!
Edit: Sample code:
Scalars of probab. fct.
np.array([.5,0.6,1,1,1,-1,-1,-1,-.5,-0.6])
Stylized Dataframe with corresponding labels :
df['labels']
0 1.0
1 1.0
2 1.0
3 1.0
4 1.0
5 1.0
6 0.0
7 0.0
8 1.0
9 1.0
*Numerical values of Category 1 should be plotted blue
Numerical values of Category 0 should be plotted in green
Using seaborn
, you can easily build complex pyplot
charts.
First, we reconstruct your dataframe:
import pandas as pd
import numpy as np
df = pd.DataFrame()
df['values'] = np.array([.5,0.6,1,1,1,-1,-1,-1,-.5,-0.6])
df['category'] = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0]
print(df)
values category
0 0.5 1.0
1 0.6 1.0
2 1.0 1.0
3 1.0 1.0
4 1.0 1.0
5 -1.0 1.0
6 -1.0 0.0
7 -1.0 0.0
8 -0.5 1.0
9 -0.6 1.0
Then we use your dataset and configure the color (hue
parameter) with the "category" column (notice that you define the color palette independently from the actual values):
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_style('whitegrid')
sns.histplot(
data=df,
x="values",
hue="category",
palette=['green', 'blue']
)
This prints the following result: