1.Delayed, sparse reward(feedback), Long-term planning
Hierarchical Deep Reinforcement Learning, Sub-goal, SAMDP, optoins, Thompson sampling, Boltzman exploration, Improving Exploration
2.Partial observability, Imperfect-Information
Memory, Nash equilibria, MCTS, self-play, LSTM, active perception, curiosity
3.Large state space, Large action space
Hardware, Distributon, Deeper Neural Network.
黄世宇/Shiyu Huang's Personal Page:https://huangshiyu13.github.io/