What's the best way to implement real time operant conditioning (supervised reward/punishment-based learning) for an agent? Should I use a neural network (and what type)? Or something else?
I want the agent to be able to be trained to follow commands like a dog. The commands would be in the form of gestures on a touchscreen. I want the agent to be able to be trained to follow a path (in continuous 2D space), make behavioral changes on command (modeled by FSM state transitions), and perform sequences of actions.
The agent would be in a simulated physical environment.
Reinforcement Learning is a good machine learning algorithm for your problem.
The basic reinforcement learning model consists of:
S
(you have a 2d space discretized in some way, which is the dog's current position, if you want to do continuous 2d-space, you might need a neural network to serve as the value function mapper.)A
( you mentioned the dog performs sequences of actions, e.g., move, rotate)r
of a transition (When reaching the target position, you might want to give the dog a big reward, while small rewards are also welcomed at intermediate milestones)P
and the 4 neighboring cells that are viewable to the dog.)To find the optimal policy, you can start with the model-free technique - q-learning.