machine-learningdeep-learningq-learningfunction-approximation

In Q-learning with function approximation, is it possible to avoid hand-crafting features?


I have little background knowledge of Machine Learning, so please forgive me if my question seems silly.

Based on what I've read, the best model-free reinforcement learning algorithm to this date is Q-Learning, where each state,action pair in the agent's world is given a q-value, and at each state the action with the highest q-value is chosen. The q-value is then updated as follows:

Q(s,a) = (1-α)Q(s,a) + α(R(s,a,s') + (max_a' * Q(s',a'))) where α is the learning rate.

Apparently, for problems with high dimensionality, the number of states become astronomically large making q-value table storage infeasible.

So the practical implementation of Q-Learning requires using Q-value approximation via generalization of states aka features. For example if the agent was Pacman then the features would be:

And then instead of q-values for every single state you would only need to only have q-values for every single feature.

So my question is:

Is it possible for a reinforcement learning agent to create or generate additional features?

Some research I've done:

This post mentions A Geramifard's iFDD method

which is a way of "discovering feature dependencies", but I'm not sure if that is feature generation, as the paper assumes that you start off with a set of binary features.

Another paper that I found was apropos is Playing Atari with Deep Reinforcement Learning, which "extracts high level features using a range of neural network architectures".

I've read over the paper but still need to flesh out/fully understand their algorithm. Is this what I'm looking for?

Thanks


Solution

  • It seems like you already answered your own question :)

    Feature generation is not part of the Q-learning (and SARSA) algorithm. In a process which is called preprocessing you can however use a wide array of algorithms (of which you showed some) to generate/extract features from your data. Combining different machine learning algorithms results in hybrid architectures, which is a term you might look into when researching what works best for your problem.

    Here is an example of using features with SARSA (which is very similar to Q-learning). Whether the papers you cited are helpful for your scenario, you'll have to decide for yourself. As always with machine learning, your approach is highly problem-dependent. If you're in robotics and it's hard to define discrete states manually, a neural network might be helpful. If you can think of heuristics by yourself (like in the pacman example) then you probably won't need it.