Q-value Iteration Update Rule

Recall the Q-value iteration update rule:
Q

(s,a)=
s



T(s,a,s

)(R(s,a,s

)+γ
a

ORDER A PLAGIARISM-FREE PAPER HERE

max Q-value Iteration Update Rule

Q

(s

,a

))
Let
γ=1
in this problem. In the figure below, at each box, we can go up, down, left and right unless the path is blocked and we initialize the
Q
value for all the actions in all states as 0 . The
Q
value for the 4 directions are labeled in each box below. Moving into the upper right 2 boxes will result in a reward of
+1
and
−1
, and each move will also cost
0.04
, or in another word, a reward of
−0.04
. D-fahla After 1 st iteration, enter the
Q
value at the position represented by
x,y
and
z
below:
x=
Q-table After 2 nd iteration, enter the
Q
value at the position represented by
a,b
and
c
below:
a= Q-value Iteration Update Rule

× How can I help you?