Practice quiz: Reinforcement learning introduction

第 1 个问题:You are using reinforcement learning to control a four legged robot. The position of the robot would be its _____.

吴恩达Coursera, 机器学习专项课程, Machine Learnin_sed


【正确】state

第 2 个问题:You are controlling a Mars rover. You will be very very happy if it gets to state 1 (significant scientific discovery), slightly happy if it gets to state 2 (small scientific discovery), and unhappy if it gets to state 3 (rover is permanently damaged). To reflect this, choose a reward function so that:

吴恩达Coursera, 机器学习专项课程, Machine Learnin_ico_02


【正确】R(1) > R(2) > R(3), where R(1) and R(2) are positive and R(3) is negative.

【解释】Good job!

第 3 个问题:You are using reinforcement learning to fly a helicopter. Using a discount factor of 0.75, your helicopter starts in some state and receives rewards -100 on the first step, -100 on the second step, and 1000 on the third and final step (where it has reached a terminal state). What is the return?

吴恩达Coursera, 机器学习专项课程, Machine Learnin_ci_03


【正确】-100 - 0.75100 + 0.75^21000

第 4 个问题:Given the rewards and actions below, compute the return from state 3 with a discount factor of \gamma = 0.25.

吴恩达Coursera, 机器学习专项课程, Machine Learnin_ci_04


【正确】6.25 Correct

【解释】If starting from state 3, the rewards are in states 3, 2, and 1. The return is 0+(0.25)×0+(0.25) ^2×100=6.25.

Practice quiz: State-action value function

第 1 个问题:Which of the following accurately describes the state-action value function Q(s,a)?

吴恩达Coursera, 机器学习专项课程, Machine Learnin_ci_05


【正确】It is the return if you start from state s, take action a (once), then behave optimally after that.

第 2 个问题:You are controlling a robot that has 3 actions: ← (left), → (right) and STOP. From a given state s, you have computed Q(s, ←) = -10, Q(s, →) = -20, Q(s, STOP) = 0.What is the optimal action to take in state s?

吴恩达Coursera, 机器学习专项课程, Machine Learnin_sed_06


【正确】STOP

第 3 个问题:For this problem, \gamma = 0.25. The diagram below shows the return and the optimal action from each state. Please compute Q(5, ←).

吴恩达Coursera, 机器学习专项课程, Machine Learnin_sed_07


【正确】0.625

Practice quiz: Continuous state spaces

第 1 个问题:The Lunar Lander is a continuous state MDP because:

吴恩达Coursera, 机器学习专项课程, Machine Learnin_sed_08


【正确】The state contains numbers such as position and velocity that are continuous valued

第 2 个问题:In the learning algorithm described in the videos, we repeatedly create an artificial training set to which we apply supervised learning where the input x = (s,a) and the target, constructed using Bellman’s equations, is y = _____?

吴恩达Coursera, 机器学习专项课程, Machine Learnin_ci_09


【正确】见上图

第 3 个问题:You have reached the final practice quiz of this class! What does that mean? (Please check all the answers, because all of them are correct!)

吴恩达Coursera, 机器学习专项课程, Machine Learnin_ci_10


【正确】The DeepLearning.AI and Stanford Online teams would like to give you a round of applause!

【正确】You deserve to celebrate!

【正确】Andrew sends his heartfelt congratulations to you!

【正确】What an accomplishment -- you made it!

作者:​​楚千羽​​