Model-free v.s. Model-based Reinforcement Learning
In Model based algorithms, the model learns the transition probability T(s1|(s0, a)) from the pair of current state s0 and action a to the next state s1 and if the transition probability is successfully learned, the agent will analyse the likeliness to enter a specific state (given current state and action). The main limitation of model-based algorithms is they become impractical when the state space and action space grows (S * S * A, in a tabular setup).
Model-free algorithms rely on trial-and-error to update its knowledge and therefore, they do not require space to store all the combination of states and actions.
On-policy v.s. Off-policy based Reinforcement Learning
In on-policy based reinforcement learning, an agent learns the value based on its current action a which is derived from the current policy. On the contrary, the off-policy based reinforcement learning method learns it based on the action a* obtained from another policy.
A few examples of reinforcement learning algorithms are as follows:
Q-learning
State-Action-Reward-State-Action (SARSA)
Deep Q Network (DQN)
Deep Deterministic Policy Gradient (DDPG)
Last updated