I have data as images(arrays) with their labels uploaded from folders. the data is imbalanced and i'm trying to balance it using smgon after creating dataframe.
here's the code:
r_labels=[]
im=[]
for filename in os.listdir(folder):
img = cv.imread(os.path.join(folder, filename))
if img is not None:
aio_plant = filename.split("_")
flowering_time = aio_plant[2].split(".")[0]
im.append(np.asarray(img).astype(np.float32))
r_labels.append(np.uint8(flowering_time))
df = pd.DataFrame({'images': im, 'labels':r_labels})
sm= smogn.smoter(
data = df, ## pandas dataframe
y = 'labels' ## string ('header name')
)
this is giving an error:
TypeError: unhashable type: 'numpy.ndarray'
I tried to change the type like this:
r_labels.append(flowering_time)
and it gives:
UFuncTypeError: ufunc 'subtract' did not contain a loop with signature matching types (dtype('<U2'), dtype('<U2')) -> None
the data looks like this:
images labels
0 [[[0.0, 0.0, 255.0], [0.0, 255.0, 0.0], [0.0, ... 86
1 [[[255.0, 0.0, 0.0], [255.0, 0.0, 0.0], [0.0, ... 53
2 [[[255.0, 0.0, 0.0], [0.0, 255.0, 0.0], [255.0... 46
3 [[[255.0, 0.0, 0.0], [0.0, 255.0, 0.0], [0.0, ... 44
4 [[[255.0, 0.0, 0.0], [255.0, 0.0, 0.0], [255.0... 63
... ... ...
998 [[[0.0, 0.0, 255.0], [0.0, 255.0, 0.0], [255.0... 86
999 [[[255.0, 0.0, 0.0], [0.0, 255.0, 0.0], [255.0... 215
1000 [[[0.0, 0.0, 255.0], [0.0, 0.0, 255.0], [0.0, ... 92
1001 [[[255.0, 0.0, 0.0], [0.0, 255.0, 0.0], [255.0... 61
1002 [[[255.0, 0.0, 0.0], [0.0, 255.0, 0.0], [255.0... 183
I solved the problem by converting labels to hashable integers and images column to string representation of NumPy array then converting them back after smote.
# Convert labels to hashable integers
df['labels'] = df['labels'].astype(int)
# Convert images column to string representation of NumPy array
df['images'] = df['images'].apply(lambda x: np.array2string(x.flatten(), separator=','))
sm= smogn.smoter(
data = df, ## pandas dataframe
y = 'labels', ## string ('header name')
)
sm['images'] = sm['images'].apply(lambda x: np.fromstring(x[1:-1], sep=','))
df['labels'] = df['labels'].astype(int)