RL之Q Learning：利用强化学习之Q Learning实现走迷宫—训练智能体走到迷宫(复杂迷宫)的宝藏位置

​输出结果​

​设计思路​

​实现代码​

## 实现代码

`from __future__ import print_functionimport numpy as npimport timefrom env import Envfrom reprint import outputEPSILON = 0.1ALPHA = 0.1GAMMA = 0.9MAX_STEP = 30np.random.seed(0)def epsilon_greedy(Q, state):    if (np.random.uniform() > 1 - EPSILON) or ((Q[state, :] == 0).all()):        action = np.random.randint(0, 4)  # 0~3    else:        action = Q[state, :].argmax()    return actione = Env()Q = np.zeros((e.state_num, 4))with output(output_type="list", initial_len=len(e.map), interval=0) as output_list:    for i in range(100):        e = Env()        while (e.is_end is False) and (e.step < MAX_STEP):            action = epsilon_greedy(Q, e.present_state)            state = e.present_state            reward = e.interact(action)            new_state = e.present_state            Q[state, action] = (1 - ALPHA) * Q[state, action] + \                ALPHA * (reward + GAMMA * Q[new_state, :].max())            e.print_map_with_reprint(output_list)            time.sleep(0.1)        for line_num in range(len(e.map)):            if line_num == 0:                output_list[0] = 'Episode:{} Total Step:{}, Total Reward:{}'.format(i, e.step, e.total_reward)            else:                output_list[line_num] = ''        time.sleep(2)`

## 测试记录全过程

开始.........                                                                                                                                                  .........                                                                                                                                                  .  x    .                                                                                                                                                  .........                                                                                                                                                  .  x    .                                                                                                                                                  .A  x o .                                                                                                                                                  ......... Episode:0 Total Step:17, Total Reward:100                                                                                                                                                  .........                                                                                                                                                  .........                                                                                                                                                  .  x    .                                                                                                                                                  .........                                                                                                                                                  .  x    .                                                                                                                                                  .A  x o .                                                                                                                                                  .........                                                                                                                                                  .  x    .                                                                                                                                                  .A  x o .                                                                                                                                                  .       .                                                                                                                                                  .........                                                                                                                                                  .  x    .                                                                                                                                                  .A  x o .                                                                                                                                                  .       .                                                                                                                                                  .........                                                                                                                                                  ……   Episode:98 Total Step:8, Total Reward:100                                                                                                                                                                 Episode:99 Total Step:11, Total Reward:100                                                                                                                                                                 