I tried to custom environment with a reinforcement learning(RL) project.
Some examples such as ping-pong, Aarti, Super-Mario, in this case, action, and observation space really small.
But, my project action, observation space is really huge size better than some examples.
And, I will use the space for at least 5000+ actions and observations.
Then, how can I effectively handle this massive amount of action and observation?
Currently, I am using Q-table learning, so I use a wrapper function to handle it.
But this seems to be very ineffective.
Yes, Q-table learning is quite old and requires extremely huge amount of memory since it stores Q value in a table. In your case, Q-table Learning seems not good enough. A better Choice would be Deep Q Network(DQN), which replaces table by networks, but it is not that efficient.
As for the huge observation space, it is fine. But the action space (5000+) seems too huge, it requires lots of time to converge. To reduce the time used for training, I would recommend PPO.