pythonarraysnumpytensorflowvalueerror

ValueError: setting an array element with a sequence in Python numpy array


I'm currently working on a project to build an AI chatbot using Python, and I'm encountering an error that I can't seem to resolve. When attempting to convert my training data into a numpy array, I'm getting the following error:

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (29, 2) + inhomogeneous part.

Here's the relevant section of my code:

training = []
output_empty = [0] * len(classes)

for document in documents:
    bag = []
    word_patterns = document[0]
    word_patterns = [lematizer.lemmatize(word.lower()) for word in word_patterns]
    for word in words:
        bag.append(1) if word in word_patterns else bag.append(0)

    output_row = list(output_empty)
    output_row[classes.index(document[1])] = 1
    training.append([bag,output_row])

random.shuffle(training)
training = np.array(training)

train_x = list(training[:, 0])
train_y = list(training[:, 1])

My training variable is a list of lists, where each inner list contains a bag of words and an output row. I've checked the dimensions of the training list using np.shape(training), and it returns (29,), indicating that it's a 1-dimensional array. However, when I attempt to convert it into a numpy array, I encounter the aforementioned error.

I've double-checked the contents of my training list, and it seems to be formatted correctly. Each inner list contains a bag of words (a list of integers) and an output row (a list of integers), both of which have consistent lengths across all entries.

I'm not sure why I'm encountering this error or how to resolve it. Any insights or suggestions would be greatly appreciated. Thank you!

Code Files :
https://github.com/GH0STH4CKER/AI_Chatbot/blob/main/training.py https://github.com/GH0STH4CKER/AI_Chatbot/blob/main/intents.json


Solution

  • After running your code, it seemed that there are lists of two different shapes in the list 'training', one of shape 29x55 (basically 29 instances of length 55), and other of shape 29x5 (29 instances of length 5). This creates problem in generating the numpy array because of inhomogeneous shapes. You can treat the two lists as separate instances and then work around it.

    Try the following correction in your code:

    training_bag = []
    training_output_row = []
    output_empty = [0] * len(classes)
    
    for document in documents:
       bag = []
       word_patterns = document[0]
       word_patterns = [lematizer.lemmatize(word.lower()) for word in word_patterns]
       for word in words:
          bag.append(1) if word in word_patterns else bag.append(0)
    
          output_row = list(output_empty)
          output_row[classes.index(document[1])] = 1
          training_bag.append(bag)
          training_output_row.append(output_row)
      
    training_bag = np.array(training_bag)
    training_output_row = np.array(training_output_row)
    
    from sklearn.utils import shuffle
    train_x, train_y = shuffle(training_bag, training_output_row, random_state=0)