I'm trying to get the unique values of categorical variables in sorted fashion using the below code but without success.
import numpy as np
unique_values, unique_value_counts = np.unique(['Small', 'Medium', 'Large', 'Medium', 'Small', 'Large', 'Small', 'Medium'], return_counts = True)
print(unique_values)
which gives me an output like below
['Large', 'Medium', 'Small']
However, I'm expecting output in ascending format like
['Small', 'Medium', 'Large']
Is there a way wherein I can get the categorical values in a sorted format using np.unique()
?
You can first translate your strings using a dictionary mapping:
a = np.array(['Small', 'Medium', 'Large', 'Medium',
'Small', 'Large', 'Small', 'Medium'])
order = ['Small', 'Medium', 'Large']
key = {k:v for v,k in enumerate(order)}
# {'Small': 0, 'Medium': 1, 'Large': 2}
_, idx, unique_value_counts = np.unique(np.vectorize(key.get)(a),
return_index=True,
return_counts=True)
unique_values = a[idx]
unique_values
# array(['Small', 'Medium', 'Large'], dtype='<U6')
unique_value_counts
# array([3, 3, 2])