I would like to get some helpful instructions about how to use the Q-learning algorithm with function approximation. For the basic Q-learning algorithm I have found examples and I think I did understand it. In case of using function approximation I get into trouble. Can somebody give me an explanation through a short example how it works?
What I know:
I have checked this paper: Q-learning with function approximation
But I cant find any useful tutorial how to use it.
Thanks for help!
To my view, this is one of the best references to start with. It is well written with several pseudo-code examples. In your case, you can simplify the algorithms by ignoring eligibility traces.
Also, to my experience and depending on your use case, Q-Learning might not work very well (sometimes it needs huge amounts of experience data). You can try Fitted-Q value for example, which is a batch algorithm.