wiki

Reinforcement Learning

Problems where you don’t do one shot decision making: we make a sequence of decisions over time. For example, if we are flying a helicopter, we need to make a sequence of good decisions for it to fly successfully.

The basic idea: Reward function! Every time something good happens, we give a positive reward, every time something bad happens, we give negative reinforcement. So we have to specify what is good and bad, and its up to the learning algorithm to figure out how to maximize the good outcomes and minimize the bad.