from sklearn.cluster import DBSCAN
model = DBSCAN(eps=3.3, leaf_size=5, min_samples=3)
y_pred = model.fit_predict(df)
my silhouette score is
from sklearn.metrics import silhouette_score
silhouette_score(df, y_pred)
output
0.4432857434946073
However, my labels are as so
code:
set(model.labels_)
output:
{-1, 0}
What does cluster -1
and 0
mean, and how do I right this?
note: I don't know if this is important, but
df.head()
output:
Gender Age education satisfaction salary performance
----------------------------------------------------------------------
0 0.0 0.446350 -1.010909 -0.891688 1.153254 -0.108350
1 1.0 1.322365 -0.147150 -1.868426 -0.660853 -0.291719
2 1.0 0.008343 -0.887515 -0.891688 0.246200 -0.937654
3 0.0 -0.429664 -0.764121 1.061787 0.246200 -0.763634
4 1.0 -1.086676 -0.887515 -1.868426 -0.660853 -0.644858
As you can see, my data is multidimensional, and I can't reduce the dimension
As explained in the docs, -1 stands for noise: points alone in their cluster. This means points that have less than min_sample
neighbors in the eps
neighbourhood.
Here you have a single cluster (0) and some noise (points with label -1).
If you expected more clusters you should tweak eps
and min_samples