Reinforcement Learning Algorithms
Last updated
Last updated
As mentioned earlier, a typical reinforcement learning setup is composed of two components, an agent and an environment. This environment refers to the object that the agent is acting on and it starts by sending a state to the agent. Following this, the agent based on its knowledge, takes an action in response to that state. Subsequently, the environment sends a pair of next state and reward back to the agent. The agents updates its knowledge with the reward returned by the environment and uses it to evaluate its last action. This continues until the environment sends a terminal state.
Some of the terms that are used in reinforcement learning have been explained below:
Action (A): This represents all the possible moves that the agent can take
State (S): This represents the current situation returned by the environment.
Reward (R): This is an immediate return sent back from the environment to evaluate the last action.
Policy (Ï€): This represents the strategy that the agent employs to determine next action based on the current state.
Value (V): This is the expected long-term return with discount (as opposed to the short-term reward R). Additionally, Vπ(s) is defined as the expected long-term return of the current state under policy π.
Q-value or action-value (Q): This represents Q-value, which is similar to Value and it takes an extra parameter, the current action a. Qπ(s, a) refers to the long-term return of the current state s where action is taken under policy π.