pythonscikit-learnsvmbinary-decision-diagram

How do I display numerical values that come from two labels (0,1) in two different colors in python?


I'm currently working on a binary SVM classifier. To visualize how the classifier works I want to create a histogram of the probability density function (calculated with scikit) that displays the scalar of a single data point (whether it belongs to class 0 or 1).

The plot:Histogram of probability density function

Note that in SVM, the 'cutting edges' of the classifier are -1 and 1. The graph nicely depicts that there is some decisions boundary at [-1,1].

Back to my problem: I want to color the data points of the labels 0 and 1 separately in order to analyze the soft margin(the area between -1 and 1)

The probability density function is stored as np.array The corresponding labels are stored in a pandas dataframe.

How do I link array and dataframe that the numerical values of category 0 are plotted in i.e. 'green' and category 1 are plotted as i.e.'blue' ?

Code:

plt.hist(decisions_function_cv, bins=500, color='navy')

I tried to save both in the same dataframe but I cannot decrypt how I program it in the intended way stated above into the decisions function :(

Some1 got a smart approach? Thanks in advance!

Edit: Sample code:

Scalars of probab. fct.

 np.array([.5,0.6,1,1,1,-1,-1,-1,-.5,-0.6])

Stylized Dataframe with corresponding labels :

df['labels']

0      1.0
1      1.0
2      1.0
3      1.0
4      1.0
5      1.0
6      0.0
7      0.0
8      1.0
9      1.0

*Numerical values of Category 1 should be plotted blue

Numerical values of Category 0 should be plotted in green


Solution

  • Using seaborn, you can easily build complex pyplot charts.

    First, we reconstruct your dataframe:

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame()
    df['values'] = np.array([.5,0.6,1,1,1,-1,-1,-1,-.5,-0.6])
    df['category'] = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0]
    
    print(df)
    
       values  category
    0     0.5       1.0
    1     0.6       1.0
    2     1.0       1.0
    3     1.0       1.0
    4     1.0       1.0
    5    -1.0       1.0
    6    -1.0       0.0
    7    -1.0       0.0
    8    -0.5       1.0
    9    -0.6       1.0
    

    Then we use your dataset and configure the color (hue parameter) with the "category" column (notice that you define the color palette independently from the actual values):

    import seaborn as sns
    import matplotlib.pyplot as plt
    sns.set_style('whitegrid')
    
    sns.histplot(
        data=df,
        x="values",
        hue="category",
        palette=['green', 'blue']
    )
    

    This prints the following result:

    enter image description here