kerassequence-to-sequence

many to many sequence prediction variable length input/output inkeras


Im trying to to predict a variable length input/output many to many sequence using Keras, the dataframe below is a representation of the data . 5 columns and one target column.

    df3={'email': [[0,0,0,1],[0,1,2],[0,3,1,5],[0,0,0,1],[0,1,2],[0,3,1,5]],
         'fax':[[0,1,0,1],[3,2],[0,2,1,5,4,6],[0,1,0,1],[3,2],[0,2,1,5,4,6]],
         'physical_mail':[[0,0,0,2],[0,2],[0,9,1,3,4,0],[0,0,3,0],[1,2],[0,2,0,2,4,6]],
         'cold_call':[[0,0,0,0,0,0],[0,2,0,0],[0,1,1,3,2,0,2,2,],[0,0,3,0,0,0,0],[1,2,5,0,0,1,2],[0,2,0,2,4,3,9,0,6]],
         'in_person':[[0,0,0,0,0,0],[0,0,0,0],[0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,1],[1,0,0,0,0,0,0],[0,2,0,2,0,0,9,0,0,0,0,1]],
          'tar':[[0,1],[0,0,0,0],[0,0,0,0,1],[0,1],[0,0,0,0],[0,0,0,0,1]]
         }
    df4=pd.Dataframe(df3)

To reshape the data there are six sample , 5 columns which are fed one column at a time and y is 6 samples , 1 column one at a time

    x_train=df4[['email','fax','physical_mail','cold_call','in_person']].values.reshape(6,5,1)
    y_train=df4.tar.values.reshape(6,1,1)


 
 model = Sequential()  
 ## 5 columns which are passed one at a time so the input shape (5,1)
 model.add(LSTM(64 , input_shape=(5,1))) 
 # kinda not sure about the RepeatVector argument 
 model.add(RepeatVector(10))
 model.add(LSTM(64,return_sequences=True))
 model.add(TimeDistributed(Dense(1)))
 model.add(Activation('linear'))   
 model.compile(loss='mean_squared_error', optimizer='rmsprop')

Im seeing an error " Setting an array element with sequence . Is it because the input is a mixture of lists ? If so how to flatten this ?


Solution

  • Try this -

    np.array([np.concatenate(pad_sequences(list(v), maxlen=12)) for k,v in df4[['email','fax','physical_mail','cold_call','in_person']].items()])
    
    array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 1, 5, 0, 0, 0, 0, 0, 0, 0, 0,
            0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0,
            0, 0, 0, 3, 1, 5],
           [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            3, 2, 0, 0, 0, 0, 0, 0, 0, 2, 1, 5, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0,
            0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 2, 0, 0, 0, 0, 0, 0,
            0, 2, 1, 5, 4, 6],
           [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            0, 2, 0, 0, 0, 0, 0, 0, 0, 9, 1, 3, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0,
            0, 2, 0, 2, 4, 6],
           [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2,
            0, 0, 0, 0, 0, 0, 0, 1, 1, 3, 2, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 3,
            0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 5, 0, 0, 1, 2, 0, 0, 0, 0, 2, 0,
            2, 4, 3, 9, 0, 6],
           [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 2, 0, 0,
            9, 0, 0, 0, 0, 1],
           [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
            0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            0, 0, 0, 0, 0, 1]]
    
    

    This should give you a 1D array for each row, where each of the columns is padded to 12 length and concatenated. Assuming that this is what you need. If you need 2D array for each row then ignore the concatenate part.

    np.array([pad_sequences(list(v), maxlen=12) for k,v in df4.items()])
    
    array([[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
            [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2],
            [0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 1, 5],
            [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
            [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2],
            [0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 1, 5]],
    
           [[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1],
            [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 2],
            [0, 0, 0, 0, 0, 0, 0, 2, 1, 5, 4, 6],
            [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1],
            [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 2],
            [0, 0, 0, 0, 0, 0, 0, 2, 1, 5, 4, 6]],
    
           [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2],
            [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2],
            [0, 0, 0, 0, 0, 0, 0, 9, 1, 3, 4, 0],
            [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0],
            [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2],
            [0, 0, 0, 0, 0, 0, 0, 2, 0, 2, 4, 6]],
    
           [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
            [0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0],
            [0, 0, 0, 0, 0, 1, 1, 3, 2, 0, 2, 2],
            [0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0],
            [0, 0, 0, 0, 0, 1, 2, 5, 0, 0, 1, 2],
            [0, 0, 0, 0, 2, 0, 2, 4, 3, 9, 0, 6]],
    
           [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
            [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
            [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
            [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
            [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
            [0, 2, 0, 2, 0, 0, 9, 0, 0, 0, 0, 1]],
    
           [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
            [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
            [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
            [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
            [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
            [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]]], dtype=int32)