Markov Decision Process
Prof. Neeraj Bhargava
Kapil Chauhan
Department of Computer Science
School of Engineering & Systems Sciences
MDS University, Ajmer
Introduction
 Reinforcement Learning is a type of Machine
Learning.
 It allows machines and software agents to
automatically determine the ideal behavior within a
specific context, in order to maximize its
performance.
Cont..
 In the problem, an agent is supposed to decide the
best action to select based on his current state.
 When this step is repeated, the problem is known as
a Markov Decision Process.
Process of MDP
Markov Decision Process
(MDP) model contains:
 A set of possible world states S.
 A set of Models.
 A set of possible actions A.
 A real valued reward function R(s,a).
 A policy the solution of Markov Decision Process.
Model:
 A State is a set of tokens that represent every state that
the agent can be in.
 A Model (sometimes called Transition Model) gives
an action’s effect in a state. In particular, T(S, a, S’)
defines a transition T where being in state S and taking
an action ‘a’ takes us to state S’ (S and S’ may be same).
Cont..
 An Action A is set of all possible actions. A(s) defines
the set of actions that can be taken being in state S.
 A Reward is a real-valued reward function. R(s)
indicates the reward for simply being in the state S.
 A Policy is a solution to the Markov Decision Process.
A policy is a mapping from S to a. It indicates the
action ‘a’ to be taken while in state S.
Assignment
 Explain Markov Decision Process with example.

Markov decision process

  • 1.
    Markov Decision Process Prof.Neeraj Bhargava Kapil Chauhan Department of Computer Science School of Engineering & Systems Sciences MDS University, Ajmer
  • 2.
    Introduction  Reinforcement Learningis a type of Machine Learning.  It allows machines and software agents to automatically determine the ideal behavior within a specific context, in order to maximize its performance.
  • 3.
    Cont..  In theproblem, an agent is supposed to decide the best action to select based on his current state.  When this step is repeated, the problem is known as a Markov Decision Process.
  • 4.
  • 5.
    Markov Decision Process (MDP)model contains:  A set of possible world states S.  A set of Models.  A set of possible actions A.  A real valued reward function R(s,a).  A policy the solution of Markov Decision Process.
  • 6.
    Model:  A Stateis a set of tokens that represent every state that the agent can be in.  A Model (sometimes called Transition Model) gives an action’s effect in a state. In particular, T(S, a, S’) defines a transition T where being in state S and taking an action ‘a’ takes us to state S’ (S and S’ may be same).
  • 7.
    Cont..  An ActionA is set of all possible actions. A(s) defines the set of actions that can be taken being in state S.  A Reward is a real-valued reward function. R(s) indicates the reward for simply being in the state S.  A Policy is a solution to the Markov Decision Process. A policy is a mapping from S to a. It indicates the action ‘a’ to be taken while in state S.
  • 8.
    Assignment  Explain MarkovDecision Process with example.