吴恩达Coursera, 机器学习专项课程, Machine Learnin

原创

楚千羽 2022-12-10 15:35:04 ©著作权

文章标签 ci ico sed 文章分类 机器学习人工智能

©著作权归作者所有：来自51CTO博客作者楚千羽的原创作品，请联系作者获取转载授权，否则将追究法律责任

Practice quiz: Reinforcement learning introduction

第 1 个问题：You are using reinforcement learning to control a four legged robot. The position of the robot would be its _____.

吴恩达Coursera, 机器学习专项课程, Machine Learnin_sed

【正确】state

第 2 个问题：You are controlling a Mars rover. You will be very very happy if it gets to state 1 (significant scientific discovery), slightly happy if it gets to state 2 (small scientific discovery), and unhappy if it gets to state 3 (rover is permanently damaged). To reflect this, choose a reward function so that:

吴恩达Coursera, 机器学习专项课程, Machine Learnin_ico_02

【正确】R(1) > R(2) > R(3), where R(1) and R(2) are positive and R(3) is negative.

【解释】Good job!

第 3 个问题：You are using reinforcement learning to fly a helicopter. Using a discount factor of 0.75, your helicopter starts in some state and receives rewards -100 on the first step, -100 on the second step, and 1000 on the third and final step (where it has reached a terminal state). What is the return?

吴恩达Coursera, 机器学习专项课程, Machine Learnin_ci_03

【正确】-100 - 0.75100 + 0.75^21000

第 4 个问题：Given the rewards and actions below, compute the return from state 3 with a discount factor of \gamma = 0.25.

吴恩达Coursera, 机器学习专项课程, Machine Learnin_ci_04

【正确】6.25 Correct

【解释】If starting from state 3, the rewards are in states 3, 2, and 1. The return is 0+(0.25)×0+(0.25) ^2×100=6.25.

Practice quiz: State-action value function

第 1 个问题：Which of the following accurately describes the state-action value function Q(s,a)?

吴恩达Coursera, 机器学习专项课程, Machine Learnin_ci_05

【正确】It is the return if you start from state s, take action a (once), then behave optimally after that.

第 2 个问题：You are controlling a robot that has 3 actions: ← (left), → (right) and STOP. From a given state s, you have computed Q(s, ←) = -10, Q(s, →) = -20, Q(s, STOP) = 0.What is the optimal action to take in state s?

吴恩达Coursera, 机器学习专项课程, Machine Learnin_sed_06

【正确】STOP

第 3 个问题：For this problem, \gamma = 0.25. The diagram below shows the return and the optimal action from each state. Please compute Q(5, ←).

吴恩达Coursera, 机器学习专项课程, Machine Learnin_sed_07

【正确】0.625

Practice quiz: Continuous state spaces

第 1 个问题：The Lunar Lander is a continuous state MDP because:

吴恩达Coursera, 机器学习专项课程, Machine Learnin_sed_08

【正确】The state contains numbers such as position and velocity that are continuous valued

第 2 个问题：In the learning algorithm described in the videos, we repeatedly create an artificial training set to which we apply supervised learning where the input x = (s,a) and the target, constructed using Bellman’s equations, is y = _____?

吴恩达Coursera, 机器学习专项课程, Machine Learnin_ci_09

【正确】见上图

第 3 个问题：You have reached the final practice quiz of this class! What does that mean? (Please check all the answers, because all of them are correct!)

吴恩达Coursera, 机器学习专项课程, Machine Learnin_ci_10

【正确】The DeepLearning.AI and Stanford Online teams would like to give you a round of applause!

【正确】You deserve to celebrate!

【正确】Andrew sends his heartfelt congratulations to you!

【正确】What an accomplishment -- you made it!

作者：楚千羽

上一篇：吴恩达Coursera, 机器学习专项课程, Ma

下一篇：吴恩达Coursera, 机器学习专项课程, Machine Learning

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯