场景描述

  • Spark 获取MySQL数据并持久化入 json、parquet文件过程记录分析
  • 解析异常
  • 具体原因待分析
Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve '`id`' given input columns: [
id, name, age, sex];;
'Project ['id, name#1, age#2, sex#3]
+- Relation[
id#0,name#1,age#2,sex#3] JDBCRelation(user)

at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
  • 实验数据
DROP TABLE IF EXISTS `user`;
CREATE TABLE `user` (
`
id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) COLLATE utf8_unicode_ci DEFAULT '',
`age` int(11) DEFAULT NULL,
`sex` varchar(255) COLLATE utf8_unicode_ci DEFAULT '',
PRIMARY KEY (`
id`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

-- ----------------------------
-- Records of user
-- ----------------------------
INSERT INTO `user` VALUES ('1', 'kngines', '19', 'M');
INSERT INTO `user` VALUES ('2', 'li', '21', 'F');
INSERT INTO `user` VALUES ('3', 'wangw', '23', 'F');
INSERT INTO `user` VALUES ('4', 'mazi', '18', 'M');
INSERT INTO `user` VALUES ('6', 'xiaoli', '33', 'M');
  • select 字段持久化至 parquet 文件中
  • 抛异常,id 解析错误
jdbcDF.select("id","name","age","sex")
.write
.format("parquet")
.save("./out/result/userp")
  • 正确代码
jdbcDF.
.write
.format("parquet")
.save("./out/result/userp")

References