I'm following the example of Integrating CITEseq data with Deep Learning. The code worked until the third part of the example, where it is supposed to train the autoencoder. Since I'm new to keras models, I'm basically just copying and pasting the code, so I do not know how the one on the website is working and mine is not.
I've tried changing the fit funcion from
estimator = autoencoder.fit([X_scRNAseq, X_scProteomics],
[X_scRNAseq, X_scProteomics],
epochs = 100, batch_size = 128,
validation_split = 0.2, shuffle = True, verbose = 1)
to
estimator = autoencoder.fit([X_scRNAseq, X_scRNAseq],
[X_scRNAseq, X_scRNAseq],
epochs = 100, batch_size = 128,
validation_split = 0.2, shuffle = True, verbose = 1)
in order to fix the same number of samples problem and it worked, but that does not train the autoencoder the way it is supposed to.
Both X_scRNAseq and X_scProteomics are numpy arrays with shapes of (36280, 8617) and (13, 8617), respectively. The model summary is:
Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
scRNAseq (InputLayer) (None, 8617) 0
__________________________________________________________________________________________________
scProteomics (InputLayer) (None, 8617) 0
__________________________________________________________________________________________________
Encoder_scRNAseq (Dense) (None, 50) 430900 scRNAseq[0][0]
__________________________________________________________________________________________________
Encoder_scProteomics (Dense) (None, 10) 86180 scProteomics[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (None, 60) 0 Encoder_scRNAseq[0][0]
Encoder_scProteomics[0][0]
__________________________________________________________________________________________________
Bottleneck (Dense) (None, 50) 3050 concatenate_1[0][0]
__________________________________________________________________________________________________
Concatenate_Inverse (Dense) (None, 60) 3060 Bottleneck[0][0]
__________________________________________________________________________________________________
Decoder_scRNAseq (Dense) (None, 8617) 525637 Concatenate_Inverse[0][0]
__________________________________________________________________________________________________
Decoder_scProteomics (Dense) (None, 8617) 525637 Concatenate_Inverse[0][0]
==================================================================================================
Total params: 1,574,464
Trainable params: 1,574,464
Non-trainable params: 0
__________________________________________________________________________________________________
The error I get when I try to apply the fit function is:
ValueError: All input arrays (x) should have the same number of samples. Got array shapes: [(36280, 8617), (13, 8617)]
Thank you!
Keras expects the first axis of your input data to be the number of samples. As you said, X_scRNAseq
shape is (36280, 8617)
and the shape of X_scProteomics
is (13, 8617)
. Keras expects the first axis to be the number of samples which isn't true in this case.
The solution, I believe, is to reshape both X_scRNAseq
and X_scProteomics
like so:
X_scRNAseq = np.swapaxes(X_scRNAseq, 0, 1) #(8617, 36280)
X_scProteomics = np.swapaxes(X_scProteomics, 0, 1) #(8617, 13)
Then, fit your model:
estimator = autoencoder.fit([X_scRNAseq, X_scProteomics],
[X_scRNAseq, X_scProteomics],
epochs = 100, batch_size = 128,
validation_split = 0.2, shuffle = True, verbose = 1)