I have a code that generates in a for
loop two NumPy arrays (data_transform
). In the first loop, it generates a NumPy array of (40, 2)
, and in the second loop, one of (175, 2)
. I want to concatenate these two arrays into one, to give me an array of (215, 2)
. I tried with np.concatenate()
and np.append()
, but it gives me an error, since the arrays must be the same size. Here is an example of how I'm doing the code:
result_arr = np.array([])
for label in labels_set:
data = [index for index, value in enumerate(labels_list) if value == label]
for i in data:
sub_corpus.append(corpus[i])
data_sub_tfidf = vec.fit_transform(sub_corpus)
data_transform = pca.fit_transform(data_sub_tfidf)
# Append array
sub_corpus = []
I have also used np.row_stack()
but nothing else gives me a value of (175, 2)
which is the second array I want to concatenate.
What @hpaulj was trying to say with
Stick with list append when doing loops.
is
#use a normal list
result_arr = []
for label in labels_set:
data_transform = pca.fit_transform(data_sub_tfidf)
# append the data_transform object to that list
# Note: this is not np.append(), which is slow here
result_arr.append(data_transform)
# and stack it after the loop
# This prevents slow memory allocation in the loop.
# So only one large chunk of memory is allocated since
# the final size of the concatenated array is known.
result_arr = np.concatenate(result_arr)
# or
result_arr = np.stack(result_arr, axis=0)
# or
result_arr = np.vstack(result_arr)
Your arrays don't really have different dimensions. They have one different dimension, the other one is identical. And in that case you can always stack along the "different" dimension.