Concept of mini batch in deep generative model using pyro

I am new to probabilistic programming and ML. I am following a code on deep Markov model given on pyro's website. The link to the github page to that code is:

I understand most part of the code. The part I don't understand is mini batch idea they are using from line 175.

Question 1: Could someone explain what are they doing there when they are using mini-batch?

In pyro documentation they say

mini_batch is a three dimensional tensor, with the first dimension being the batch dimension, the second dimension being the temporal dimension, and the final dimension being the features (88-dimensional in our case)'

Question 2: What does temporal dimension means here?

Because I want to use this code on my dataset which is a sequential data. I have done one hot encoding of my data such that it's dimension is (10000,500,20) where 10000 is the number of examples/Sequences, 500 is the length of each of these sequences and 20 is the number of features.

Question 3: How can I use my one hot encoded data as mini batch here?

I'm sorry if it is a really basic question but, insights will be appreciated.

Link to that documentation is:


  • Question 1: Could someone explain what are they doing there when they are using mini-batch?

    To optimize most of the deep learning models, we use mini-batch gradient descent. Here, A mini_batch refers to a small number of examples. Let's say, we have 10,000 training examples and we want to create mini-batches of 50 examples. So, in total there will be 200 mini-batches and we will perform 200 parameter updates during one iteration over the entire dataset.

    Question 2: What does the temporal dimension mean here?

    In your data: (10000, 500, 20), the second dimension refers to the temporal dimension. You can consider you have examples with 500 timesteps (t1, t2, ..., t500).

    Question 3: How can I use my one-hot encoded data as mini-batch here?

    In your scenario, you can split your data (10000, 500, 20) into 200 small batches of size (50, 500, 20) where 50 is the number of examples/Sequences in the mini-batch, 500 is the length of each of these sequences and 20 is the number of features.

    How do we decide the mini-batch size? Basically, we can tune the batch size just like any other hyperparameters of our model.