继前文《python基于XGBoost开发构建海上船舶航行轨迹多变量序列预测分析模型》基于常规的机器学习模型XGBoost实现了海上船舶航行轨迹预测,其实前文就说过这种多变量序列之类的数据最合适的模型还是深度学习,比如:LSTM、CNN、GRU、RNN,这里主要就是想基于时序建模领域内经典的模型LSTM开发实践航行轨迹预测分析。
数据样例同前文,如下:
mmsi,lat,lon,Sog,Cog,timestamp
33,30.430935,121.840168,13.3,218,1530665006
33,30.431335,121.840587,13.2,33,1530665016
33,30.432252,121.84158,13.2,145,1530665036
33,30.432675,121.841973,13.2,30,1530665046
33,30.433992,121.84337,13.2,20,1530665076
33,30.434867,121.844305,13.1,39,1530665096
33,30.435257,121.844728,13.1,43,1530665106
33,30.434867,121.844305,13.1,23,1530665119
33,30.43619,121.845688,13.1,50,1530665126
33,30.437892,121.847477,13.1,53,1530665166
33,30.43838,121.84798,13.1,50,1530665176
33,30.43887,121.848482,13.2,28,1530665187
33,30.44061,121.850253,13.1,50,1530665226
33,30.441017,121.850657,13.1,42,1530665236
33,30.43749,121.84707,13.012,54,1530665238
33,30.44155,121.85121,13.1,117,1530665247
33,30.442865,121.852583,13.1,182,1530665276
33,30.443728,121.853523,13.2,0,1530665296
33,30.44415,121.854003,13.2,200,1530665307
33,30.444593,121.854473,13.2,140,1530665317
33,30.445012,121.854958,13.2,107,1530665327
33,30.44195,121.85162,13.212,275,1530665339
33,30.447153,121.857333,13.2,277,1530665377
33,30.447575,121.857832,13.2,267,1530665387
33,30.447575,121.857832,13.2,276,1530665410
33,30.448822,121.859192,13.2,272,1530665416
33,30.45052,121.861037,13.2,260,1530665456
33,30.451377,121.861993,13.2,274,1530665476
33,30.4519,121.86256,13.2,265,1530665486
33,30.452343,121.863042,13.2,250,1530665497
33,30.452773,121.863522,13.2,291,1530665507
33,30.454465,121.86536,13.1,104,1530665547
33,30.455717,121.866725,13.1,99,1530665576
33,30.456632,121.867703,13.1,92,1530665596
可以看到:数据集中给出来了航行过程中记录得到的详细数据,包括:经度和维度还有sog与cog。
首先需要对原始数据集进行解析处理,如下:
with open(data) as f:
data_list = json.load(f)
data_dict = {}
for one_list in data_list:
mmsi, ts, lat, lon, Sog, Cog = one_list
if mmsi in data_dict:
data_dict[mmsi].append([ts, lat, lon, Sog, Cog])
else:
data_dict[mmsi]=[[ts, lat, lon, Sog, Cog]]
sorted_list = sorted(data_dict.items(), key=lambda e:len(e[1]), reverse=True)
mmsi = sorted_list[0][0]
print("mmsi: ", mmsi)
datas = data_dict[mmsi]
print("datas_length: ", len(datas))
with open(save_path, "w") as f:
f.write(json.dumps(datas))
这里为了方便,我直接选取了数据量较多的实例来开发模型。
数据处理完成后自动存储在feature.json文件中,接下来加载feature.json并归一化处理如下:
def loadJsonData(data="feature.json"):
"""
加载数据集
并完成多变量序列数据归一化处理
"""
with open(data) as f:
feature = json.load(f)
D = [one_list[1:] for one_list in feature]
scaler_list, D = multiColDataScalar(D)
return np.array(D), scaler_list
之后完成训练集-测试集划分,如下:
# 这里设置迭代次数
epochs = 100
# 测试数据集占比
ratio = 0.85
dataset, scaler_list = loadJsonData(data="feature.json")
# 自动生成和创建结果存储目录
saveDir = "results/lstm/"
if not os.path.exists(saveDir):
os.makedirs(saveDir)
# 训练集-测试集划分
X_train, X_test, y_train, y_test = dataSplit(dataset, 7, ratio=ratio)
这里搭建的模型很简单,只使用了一层LSTM和两层全连接,结构如下:
开启模型训练,在训练结束后绘制loss曲线,如下:
# 模型训练拟合
history = model.fit(
X_train, y_train, epochs=epochs, batch_size=16, validation_split=0.2
)
# Loss曲线
print(history.history.keys())
historyResult = {}
plt.clf()
plt.plot(history.history["loss"])
plt.plot(history.history["val_loss"])
plt.title("Model Loss Cruve")
plt.ylabel("Loss")
plt.xlabel("Epochs")
plt.legend(["train", "test"], loc="upper left")
plt.savefig(saveDir + "train_validation_loss.png")
可视化如下所示:
接下来我们看下预测效果对比曲线:
在时序建模中,有这一类很强烈的需求就是对未来一段时间内的数据进行预测,这里我们同样实现了未来预测,如下:
在船舶航行状态评估、船舶碰撞概率检测等场景种有着对海面船舶航行轨迹较高的预测需求,准确实时地对航行轨迹进行预测分析有助于评估船舶航行的状态,及时对可能存在的潜在威胁进行发现预警处理,对未来数据的预测能够有效进行预警处理。
为了更加清晰展示,这里对数据进行稀疏处理,如下:
绿色曲线表达的就是未来30个时刻下的位置走势,中长期的预测未必完全能够信赖模型的表现,但是是一个趋势的选择。