I am new to probabilistic programming and ML. I am following a code on deep Markov model given on pyro's website. The link to the github page to that code is:
https://github.com/pyro-ppl/pyro/blob/dev/examples/dmm/dmm.py
I understand most part of the code. The part I don't understand is mini batch idea they are using from line 175.
Question 1: Could someone explain what are they doing there when they are using mini-batch?
In pyro documentation they say
mini_batch is a three dimensional tensor, with the first dimension being the batch dimension, the second dimension being the temporal dimension, and the final dimension being the features (88-dimensional in our case)'
Question 2: What does temporal dimension means here?
Because I want to use this code on my dataset which is a sequential data. I have done one hot encoding of my data such that it's dimension is (10000,500,20) where 10000 is the number of examples/Sequences, 500 is the length of each of these sequences and 20 is the number of features.
Question 3: How can I use my one hot encoded data as mini batch here?
I'm sorry if it is a really basic question but, insights will be appreciated.
Link to that documentation is:
Question 1: Could someone explain what are they doing there when they are using mini-batch?
To optimize most of the deep learning models, we use mini-batch gradient descent. Here, A mini_batch
refers to a small number of examples. Let's say, we have 10,000 training examples and we want to create mini-batches of 50 examples. So, in total there will be 200 mini-batches and we will perform 200 parameter updates during one iteration over the entire dataset.
Question 2: What does the temporal dimension mean here?
In your data: (10000, 500, 20)
, the second dimension refers to the temporal dimension. You can consider you have examples with 500 timesteps (t1, t2, ..., t500)
.
Question 3: How can I use my one-hot encoded data as mini-batch here?
In your scenario, you can split your data (10000, 500, 20)
into 200 small batches of size (50, 500, 20)
where 50 is the number of examples/Sequences in the mini-batch, 500 is the length of each of these sequences and 20 is the number of features.
How do we decide the mini-batch size? Basically, we can tune the batch size just like any other hyperparameters of our model.