作者行业新人,如果有不对的地方,希望可以指出,共同学习。
确定表名和字段
- 1.学生表
- Student(s_id,s_name,s_birth,s_sex) --学生编号,学生姓名, 出生年月,学生性别
- 2.课程表
- Course(c_id,c_name,t_id) – --课程编号, 课程名称, 教师编号
- 3.教师表
- Teacher(t_id,t_name) --教师编号,教师姓名
- 4.成绩表
- Score(s_id,c_id,s_score) --学生编号,课程编号,分数
建表和插入数据
--建表
--学生表
CREATE TABLE `Student`(
`s_id` VARCHAR(20),
`s_name` VARCHAR(20) NOT NULL DEFAULT '',
`s_birth` VARCHAR(20) NOT NULL DEFAULT '',
`s_sex` VARCHAR(10) NOT NULL DEFAULT '',
PRIMARY KEY(`s_id`)
);
--课程表
CREATE TABLE `Course`(
`c_id` VARCHAR(20),
`c_name` VARCHAR(20) NOT NULL DEFAULT '',
`t_id` VARCHAR(20) NOT NULL,
PRIMARY KEY(`c_id`)
);
--教师表
CREATE TABLE `Teacher`(
`t_id` VARCHAR(20),
`t_name` VARCHAR(20) NOT NULL DEFAULT '',
PRIMARY KEY(`t_id`)
);
--成绩表
CREATE TABLE `Score`(
`s_id` VARCHAR(20),
`c_id` VARCHAR(20),
`s_score` INT(3),
PRIMARY KEY(`s_id`,`c_id`)
);
--插入学生表测试数据
insert into Student values('01' , '赵雷' , '1990-01-01' , '男');
insert into Student values('02' , '钱电' , '1990-12-21' , '男');
insert into Student values('03' , '孙风' , '1990-05-20' , '男');
insert into Student values('04' , '李云' , '1990-08-06' , '男');
insert into Student values('05' , '周梅' , '1991-12-01' , '女');
insert into Student values('06' , '吴兰' , '1992-03-01' , '女');
insert into Student values('07' , '郑竹' , '1989-07-01' , '女');
insert into Student values('08' , '王菊' , '1990-01-20' , '女');
--课程表测试数据
insert into Course values('01' , '语文' , '02');
insert into Course values('02' , '数学' , '01');
insert into Course values('03' , '英语' , '03');
--教师表测试数据
insert into Teacher values('01' , '张三');
insert into Teacher values('02' , '李四');
insert into Teacher values('03' , '王五');
--成绩表测试数据
insert into Score values('01' , '01' , 80);
insert into Score values('01' , '02' , 90);
insert into Score values('01' , '03' , 99);
insert into Score values('02' , '01' , 70);
insert into Score values('02' , '02' , 60);
insert into Score values('02' , '03' , 80);
insert into Score values('03' , '01' , 80);
insert into Score values('03' , '02' , 80);
insert into Score values('03' , '03' , 80);
insert into Score values('04' , '01' , 50);
insert into Score values('04' , '02' , 30);
insert into Score values('04' , '03' , 20);
insert into Score values('05' , '01' , 76);
insert into Score values('05' , '02' , 87);
insert into Score values('06' , '01' , 31);
insert into Score values('06' , '03' , 34);
insert into Score values('07' , '02' , 89);
insert into Score values('07' , '03' , 98);
查看表
使用jdbc连接上数据库
val spark = SparkSession.builder().appName("mysql50") .master("local[*]").getOrCreate()
import spark.implicits._
import org.apache.spark.sql.functions._
val url = "jdbc:mysql://192.168.87.100:3306/school"
val user = "root"
val pwd = "ok"
val driver = "com.mysql.jdbc.Driver"
val prop = new Properties()
prop.setProperty("user",user)
prop.setProperty("password",pwd)
prop.setProperty("driver",driver)
val courseTableDF: DataFrame = spark.read.jdbc(url,"Course",prop)
val scoreTableDF: DataFrame = spark.read.jdbc(url,"Score",prop)
val studentTableDF: DataFrame = spark.read.jdbc(url,"Student",prop)
val teacherTableDF: DataFrame = spark.read.jdbc(url,"Teacher",prop)
1、查询"01"课程比"02"课程成绩高的学生的信息及课程分数
studentTableDF
.join(scoreTableDF.as("s1"), "s_id")
.where("s1.c_id = 01")
.join(scoreTableDF.as("s2"), "s_id")
.where("s2.c_id = 02 and s1.s_score > s2.s_score")
.show()
/**
+----+------+----------+-----+----+-------+----+-------+
|s_id|s_name| s_birth|s_sex|c_id|s_score|c_id|s_score|
+----+------+----------+-----+----+-------+----+-------+
| 02| 钱电|1990-12-21| 男| 01| 70| 02| 60|
| 04| 李云|1990-08-06| 男| 01| 50| 02| 30|
+----+------+----------+-----+----+-------+----+-------+
*/
2、查询"01"课程比"02"课程成绩低的学生的信息及课程分数
studentTableDF
.join(scoreTableDF.as("s1"),"s_id")
.join(scoreTableDF.as("s2"),"s_id")
.where("s1.c_id = 01 and s2.c_id = 02 and s1.s_score < s2.s_score")
.show()
/**
+----+------+----------+-----+----+-------+----+-------+
|s_id|s_name| s_birth|s_sex|c_id|s_score|c_id|s_score|
+----+------+----------+-----+----+-------+----+-------+
| 01| 赵雷|1990-01-01| 男| 01| 80| 02| 90|
| 05| 周梅|1991-12-01| 女| 01| 76| 02| 87|
+----+------+----------+-----+----+-------+----+-------+
*/
3、查询平均成绩大于等于60分的同学的学生编号和学生姓名和平均成绩
scoreTableDF.as("sc")
.groupBy("s_id")
.agg(avg("s_score").as("avgScore"))
.where(col("avgScore") > 60)
.join(studentTableDF, "s_id")
.select("s_id","s_name","avgScore")
.show()
/**
+----+------+-----------------+
|s_id|s_name| avgScore|
+----+------+-----------------+
| 07| 郑竹| 93.5|
| 01| 赵雷|89.66666666666667|
| 05| 周梅| 81.5|
| 03| 孙风| 80.0|
| 02| 钱电| 70.0|
+----+------+-----------------+
*/
4、查询平均成绩小于60分的同学的学生编号和学生姓名和平均成绩(包括有成绩的和无成绩的)
studentTableDF
.join(
(scoreTableDF.groupBy("s_id")
.avg("s_score")).as("sc"), Seq("s_id"), "left_outer")
.as("s")
.withColumnRenamed("avg(s_score)", "avgScore")
.where((col("avgScore") < 60 || col("avgScore").isNull))
.select("s_id","s_name","avgScore")
.show()
/**
+----+------+------------------+
|s_id|s_name| avgScore|
+----+------+------------------+
| 08| 王菊| null|
| 06| 吴兰| 32.5|
| 04| 李云|33.333333333333336|
+----+------+------------------+
*/
5、查询所有同学的学生编号、学生姓名、选课总数、所有课程的总成绩
scoreTableDF
.groupBy("s_id")
.agg(count("c_id").as("CountCourse"),
sum("s_score").as("SumScore"))
.join(studentTableDF, Seq("s_id"), "right_outer")
.select("s_id","s_name","CountCourse","SumScore")
.show()
/**
+----+------+-----------+--------+
|s_id|s_name|CountCourse|SumScore|
+----+------+-----------+--------+
| 07| 郑竹| 2| 187|
| 01| 赵雷| 3| 269|
| 05| 周梅| 2| 163|
| 08| 王菊| null| null|
| 03| 孙风| 3| 240|
| 02| 钱电| 3| 210|
| 06| 吴兰| 2| 65|
| 04| 李云| 3| 100|
+----+------+-----------+--------+
*/
6、查询"李"姓老师的数量
teacherTableDF
.where(col("t_name").startsWith("李"))
.agg(count("t_id"))
.show()
/**
+-----------+
|count(t_id)|
+-----------+
| 1|
+-----------+
*/
7、查询学过"张三"老师授课的同学的信息
studentTableDF
.join(scoreTableDF, "s_id")
.join(courseTableDF, "c_id")
.join(teacherTableDF, "t_id")
.where("t_name = '张三'")
.select("s_id","s_name","s_birth","s_sex")
.show()
/**
+----+------+----------+-----+
|s_id|s_name| s_birth|s_sex|
+----+------+----------+-----+
| 07| 郑竹|1989-07-01| 女|
| 01| 赵雷|1990-01-01| 男|
| 05| 周梅|1991-12-01| 女|
| 03| 孙风|1990-05-20| 男|
| 02| 钱电|1990-12-21| 男|
| 04| 李云|1990-08-06| 男|
+----+------+----------+-----+
*/
8、查询没学过"张三"老师授课的同学的信息
studentTableDF
.join(scoreTableDF
.join(courseTableDF, "c_id")
.join(teacherTableDF, "t_id"),Seq("s_id"),"left_outer")
.where("t_name != '张三' or t_name is NULL")
.select("s_id","s_name","s_birth","s_sex")
.distinct()
.show()
/**
+----+------+----------+-----+
|s_id|s_name| s_birth|s_sex|
+----+------+----------+-----+
| 07| 郑竹|1989-07-01| 女|
| 01| 赵雷|1990-01-01| 男|
| 05| 周梅|1991-12-01| 女|
| 08| 王菊|1990-01-20| 女|
| 03| 孙风|1990-05-20| 男|
| 02| 钱电|1990-12-21| 男|
| 06| 吴兰|1992-03-01| 女|
| 04| 李云|1990-08-06| 男|
+----+------+----------+-----+
*/
9、查询学过编号为"01"并且也学过编号为"02"的课程的同学的信息:
studentTableDF
.join(scoreTableDF.as("s1"), "s_id")
.join(scoreTableDF.as("s2"), "s_id")
.where("s1.c_id = 01 and s2.c_id = 02")
.select("s_id","s_name","s_birth","s_sex")
.show()
/**
+----+------+----------+-----+
|s_id|s_name| s_birth|s_sex|
+----+------+----------+-----+
| 01| 赵雷|1990-01-01| 男|
| 05| 周梅|1991-12-01| 女|
| 03| 孙风|1990-05-20| 男|
| 02| 钱电|1990-12-21| 男|
| 04| 李云|1990-08-06| 男|
+----+------+----------+-----+
*/
10、查询学过编号为"01"但是没有学过编号为"02"的课程的同学的信息
studentTableDF
.join(scoreTableDF.where("c_id = 02"), Seq("s_id"), "left_outer")
.where("c_id is null").as("s1")
.join(scoreTableDF.as("sc2"), "s_id")
.where("sc2.c_id = 01")
.select("s_id","s_name","s_birth","s_sex")
.show()
/**
+----+------+----------+-----+
|s_id|s_name| s_birth|s_sex|
+----+------+----------+-----+
| 06| 吴兰|1992-03-01| 女|
+----+------+----------+-----+
*/
11、查询没有学全所有课程的同学的信息
studentTableDF
.join(scoreTableDF.as("s"),Seq("s_id"),"left_outer")
.groupBy("s_id")
.agg(count("s.c_id").as("stuSubNum"))
.where("stuSubNum < (select count(c_id) from course)")
.show()
/**
+----+---------+
|s_id|stuSubNum|
+----+---------+
| 07| 2|
| 05| 2|
| 08| 0|
| 06| 2|
+----+---------+
*/
12、查询至少有一门课与学号为"01"的同学所学相同的同学的信息:
studentTableDF
.join(scoreTableDF,"s_id").as("s1")
.join(scoreTableDF
.where("s_id = 01").as("s2"),"c_id")
.select("s1.s_id").distinct()
.where("s_id != 1")
.join(studentTableDF,"s_id")
.show()
/**
+----+------+----------+-----+
|s_id|s_name| s_birth|s_sex|
+----+------+----------+-----+
| 07| 郑竹|1989-07-01| 女|
| 05| 周梅|1991-12-01| 女|
| 03| 孙风|1990-05-20| 男|
| 02| 钱电|1990-12-21| 男|
| 06| 吴兰|1992-03-01| 女|
| 04| 李云|1990-08-06| 男|
+----+------+----------+-----+
*/
13、查询和"01"号的同学学习的课程完全相同的其他同学的信息:
studentTableDF
.join(scoreTableDF,"s_id").as("s1")
.join(scoreTableDF.where("s_id = 1").as("s2"),
"c_id")
.groupBy("s1.s_id")
.count.where(s"count = ${scoreTableDF.where("s_id = 1").count} and s1.s_id != 1")
.join(studentTableDF,"s_id")
.show()
/**
+----+-----+------+----------+-----+
|s_id|count|s_name| s_birth|s_sex|
+----+-----+------+----------+-----+
| 03| 3| 孙风|1990-05-20| 男|
| 02| 3| 钱电|1990-12-21| 男|
| 04| 3| 李云|1990-08-06| 男|
+----+-----+------+----------+-----+
*/
14、查询没学过"张三"老师讲授的任一门课程的学生姓名:
studentTableDF
.join(scoreTableDF,"s_id")
.join(courseTableDF,"c_id")
.join(teacherTableDF
.where("t_name = '张三'"),"t_id")
.as("s1").select("s_id")
.join(studentTableDF,Seq("s_id"),"right_outer")
.where("s1.s_id is NULL")
.show()
/**
+----+------+----------+-----+
|s_id|s_name| s_birth|s_sex|
+----+------+----------+-----+
| 08| 王菊|1990-01-20| 女|
| 06| 吴兰|1992-03-01| 女|
+----+------+----------+-----+
*/
15、查询两门及其以上不及格课程的同学的学号,姓名及其平均成绩:
scoreTableDF
.where("s_score < 60")
.groupBy("s_id").count()
.where("count >= 2")
.join(scoreTableDF,"s_id")
.groupBy("s_id")
.avg("s_score").join(studentTableDF,"s_id")
.select("s_id","s_name","avg(s_score)")
.show()
/**
+----+------+------------------+
|s_id|s_name| avg(s_score)|
+----+------+------------------+
| 06| 吴兰| 32.5|
| 04| 李云|33.333333333333336|
+----+------+------------------+
*/
16、检索"01"课程分数小于60,按分数降序排列的学生信息:
scoreTableDF
.where("c_id = 01")
.join(studentTableDF,Seq("s_id"),"right_outer")
.where("s_score < 60 or s_score is null")
.orderBy($"s_score".desc)
.show()
/**
+----+----+-------+------+----------+-----+
|s_id|c_id|s_score|s_name| s_birth|s_sex|
+----+----+-------+------+----------+-----+
| 04| 01| 50| 李云|1990-08-06| 男|
| 06| 01| 31| 吴兰|1992-03-01| 女|
| 08|null| null| 王菊|1990-01-20| 女|
| 07|null| null| 郑竹|1989-07-01| 女|
+----+----+-------+------+----------+-----+
*/
17、按平均成绩从高到低显示所有学生的所有课程的成绩以及平均成绩:
scoreTableDF
.groupBy("s_id")
.avg("s_score")
.join(scoreTableDF,"s_id")
.join(studentTableDF,"s_id")
.orderBy($"avg(s_score)".desc)
.show()
/**
+----+------------------+----+-------+------+----------+-----+
|s_id| avg(s_score)|c_id|s_score|s_name| s_birth|s_sex|
+----+------------------+----+-------+------+----------+-----+
| 07| 93.5| 03| 98| 郑竹|1989-07-01| 女|
| 07| 93.5| 02| 89| 郑竹|1989-07-01| 女|
| 01| 89.66666666666667| 03| 99| 赵雷|1990-01-01| 男|
| 01| 89.66666666666667| 01| 80| 赵雷|1990-01-01| 男|
| 01| 89.66666666666667| 02| 90| 赵雷|1990-01-01| 男|
| 05| 81.5| 01| 76| 周梅|1991-12-01| 女|
| 05| 81.5| 02| 87| 周梅|1991-12-01| 女|
| 03| 80.0| 02| 80| 孙风|1990-05-20| 男|
| 03| 80.0| 01| 80| 孙风|1990-05-20| 男|
| 03| 80.0| 03| 80| 孙风|1990-05-20| 男|
| 02| 70.0| 01| 70| 钱电|1990-12-21| 男|
| 02| 70.0| 02| 60| 钱电|1990-12-21| 男|
| 02| 70.0| 03| 80| 钱电|1990-12-21| 男|
| 04|33.333333333333336| 03| 20| 李云|1990-08-06| 男|
| 04|33.333333333333336| 01| 50| 李云|1990-08-06| 男|
| 04|33.333333333333336| 02| 30| 李云|1990-08-06| 男|
| 06| 32.5| 01| 31| 吴兰|1992-03-01| 女|
| 06| 32.5| 03| 34| 吴兰|1992-03-01| 女|
+----+------------------+----+-------+------+----------+-----+
*/
18、查询各科成绩最高分、最低分和平均分:以如下形式显示:课程ID,课程name,最高分,最低分,平均分,及格率,中等率,优良率,优秀率:
val jige = scoreDF.rdd.map(x=>{if(x.getAs("score").toString.toInt > 60) (x(1).toString,1) else (x(1).toString,0)}).reduceByKey(_+_).toDF("course_id","jige")
val zhongdeng = scoreDF.rdd.map(x=>{if(x.getAs("score").toString.toInt > 70) (x(1).toString,1) else (x(1).toString,0)}).reduceByKey(_+_).toDF("course_id","zhongdeng")
val youliang = scoreDF.rdd.map(x=>{if(x.getAs("score").toString.toInt > 80) (x(1).toString,1) else (x(1).toString,0)}).reduceByKey(_+_).toDF("course_id","youliang")
val youxiu = scoreDF.rdd.map(x=>{if(x.getAs("score").toString.toInt > 90) (x(1).toString,1) else (x(1).toString,0)}).reduceByKey(_+_).toDF("course_id","youxiu")
val s1 = scoreDF.groupBy("course_id").agg("score"->"max","score"->"min","score"->"avg","score"->"count")
s1.join(jige,"course_id")
.join(zhongdeng,"course_id")
.join(youliang,"course_id")
.join(youxiu,"course_id")
.withColumn("jgl",$"jige"/$"count(score)")
.withColumn("zdl",$"zhongdeng"/$"count(score)")
.withColumn("yll",$"youliang"/$"count(score)")
.withColumn("yxl",$"youxiu"/$"count(score)")
.drop("jige","zhongdeng","youliang","youxiu")
.show
/**
+---------+----------+----------+-----------------+------------+------------------+------------------+------------------+------------------+
|course_id|max(score)|min(score)| avg(score)|count(score)| jgl| zdl| yll| yxl|
+---------+----------+----------+-----------------+------------+------------------+------------------+------------------+------------------+
| 1| 80| 31| 64.5| 6|0.6666666666666666| 0.5| 0.0| 0.0|
| 3| 99| 20| 68.5| 6|0.6666666666666666|0.6666666666666666|0.3333333333333333|0.3333333333333333|
| 2| 90| 30|72.66666666666667| 6|0.6666666666666666|0.6666666666666666| 0.5| 0.0|
+---------+----------+----------+-----------------+------------+------------------+------------------+------------------+------------------+
*/
19、按各科成绩进行排序,并显示排名:
scoreTableDF.selectExpr(
"*","Row_Number() over(partition by c_id order by s_score desc) rank"
).show()
/**
+----+----+-------+----+
|s_id|c_id|s_score|rank|
+----+----+-------+----+
| 01| 01| 80| 1|
| 03| 01| 80| 2|
| 05| 01| 76| 3|
| 02| 01| 70| 4|
| 04| 01| 50| 5|
| 06| 01| 31| 6|
| 01| 03| 99| 1|
| 07| 03| 98| 2|
| 02| 03| 80| 3|
| 03| 03| 80| 4|
| 06| 03| 34| 5|
| 04| 03| 20| 6|
| 01| 02| 90| 1|
| 07| 02| 89| 2|
| 05| 02| 87| 3|
| 03| 02| 80| 4|
| 02| 02| 60| 5|
| 04| 02| 30| 6|
+----+----+-------+----+
*/
20、查询学生的总成绩并进行排名:
scoreTableDF
.groupBy("s_id")
.agg(sum("s_score").as("sumScore"))
.selectExpr("*","Row_Number() over(order by sumScore desc) rank")
.show()
/**
+----+--------+----+
|s_id|sumScore|rank|
+----+--------+----+
| 01| 269| 1|
| 03| 240| 2|
| 02| 210| 3|
| 07| 187| 4|
| 05| 163| 5|
| 04| 100| 6|
| 06| 65| 7|
+----+--------+----+
*/
21、查询不同老师所教不同课程平均分从高到低显示:
scoreTableDF.join(
courseTableDF.join(teacherTableDF,"t_id")
,"c_id").groupBy("t_id","c_id")
.agg(avg("s_score").as("avgScore"))
.orderBy($"avgScore".desc)
.show()
/**
+----+----+-----------------+
|t_id|c_id| avgScore|
+----+----+-----------------+
| 01| 02|72.66666666666667|
| 03| 03| 68.5|
| 02| 01| 64.5|
+----+----+-----------------+
*/
22、查询所有课程的成绩第2 名到第3 名的学生信息及该课程成绩:
studentTableDF
.join(scoreTableDF,"s_id")
.selectExpr(
"*","Row_Number() over(partition by c_id order by s_score desc) rank"
)
.where("rank between 2 and 3")
.show()
/**
+----+------+----------+-----+----+-------+----+
|s_id|s_name| s_birth|s_sex|c_id|s_score|rank|
+----+------+----------+-----+----+-------+----+
| 03| 孙风|1990-05-20| 男| 01| 80| 2|
| 05| 周梅|1991-12-01| 女| 01| 76| 3|
| 07| 郑竹|1989-07-01| 女| 03| 98| 2|
| 03| 孙风|1990-05-20| 男| 03| 80| 3|
| 07| 郑竹|1989-07-01| 女| 02| 89| 2|
| 05| 周梅|1991-12-01| 女| 02| 87| 3|
+----+------+----------+-----+----+-------+----+
*/
23、统计各科成绩各分数段人数:课程编号,课程名称,[100-85],[85-70],[70-60],[0-60]及所占百分比
scoreTableDF.groupBy("c_id")
.agg(count("s_id")
.as("zong"))
.join(
scoreTableDF.where("s_score < 60")
.groupBy("c_id")
.agg(count("s_id")
.as("bujige")),Seq("c_id"),"left")
.join(
scoreTableDF.where("60 <= s_score and s_score < 70")
.groupBy("c_id")
.agg(count("s_id")
.as("jige")),Seq("c_id"),"left")
.join(
scoreTableDF.where("70 <= s_score and s_score < 85")
.groupBy("c_id")
.agg(count("s_id")
.as("liang")),Seq("c_id"),"left")
.join(
scoreTableDF.where("85 <= s_score and s_score <= 100")
.groupBy("c_id")
.agg(count("s_id")
.as("you")),Seq("c_id"),"left")
.withColumn("bjgl",col("bujige")/col("zong"))
.withColumn("jgl",col("jige")/col("zong"))
.withColumn("ll",col("liang")/col("zong"))
.withColumn("yl",col("you")/col("zong"))
.show()
/**
+----+----+------+----+-----+----+-------------------+-------------------+-------------------+------------------+
|c_id|zong|bujige|jige|liang| you| bjgl| jgl| ll| yl|
+----+----+------+----+-----+----+-------------------+-------------------+-------------------+------------------+
| 01| 6| 2|null| 4|null| 0.3333333333333333| null| 0.6666666666666666| null|
| 03| 6| 2|null| 2| 2| 0.3333333333333333| null| 0.3333333333333333|0.3333333333333333|
| 02| 6| 1| 1| 1| 3|0.16666666666666666|0.16666666666666666|0.16666666666666666| 0.5|
+----+----+------+----+-----+----+-------------------+-------------------+-------------------+------------------+
*/
24、查询学生平均成绩及其名次:
scoreTableDF
.groupBy("s_id")
.agg(avg("s_score").as("avgScore"))
.selectExpr("*","Row_Number() over(order by avgScore desc) rank")
.join(studentTableDF,"s_id")
.show()
/**
+----+------------------+----+------+----------+-----+
|s_id| avgScore|rank|s_name| s_birth|s_sex|
+----+------------------+----+------+----------+-----+
| 07| 93.5| 1| 郑竹|1989-07-01| 女|
| 01| 89.66666666666667| 2| 赵雷|1990-01-01| 男|
| 05| 81.5| 3| 周梅|1991-12-01| 女|
| 03| 80.0| 4| 孙风|1990-05-20| 男|
| 02| 70.0| 5| 钱电|1990-12-21| 男|
| 06| 32.5| 7| 吴兰|1992-03-01| 女|
| 04|33.333333333333336| 6| 李云|1990-08-06| 男|
+----+------------------+----+------+----------+-----+
*/
25、查询各科成绩前三名的记录
scoreTableDF
.selectExpr("*","Row_Number() over(partition by c_id order by s_score desc) rank")
.where("rank between 1 and 3")
.show()
/**
+----+----+-------+----+
|s_id|c_id|s_score|rank|
+----+----+-------+----+
| 01| 01| 80| 1|
| 03| 01| 80| 2|
| 05| 01| 76| 3|
| 01| 03| 99| 1|
| 07| 03| 98| 2|
| 02| 03| 80| 3|
| 01| 02| 90| 1|
| 07| 02| 89| 2|
| 05| 02| 87| 3|
+----+----+-------+----+
*/
26、查询每门课程被选修的学生数:
scoreTableDF.groupBy("c_id").agg(count("s_id")).show()
/**
+----+-----------+
|c_id|count(s_id)|
+----+-----------+
| 01| 6|
| 03| 6|
| 02| 6|
+----+-----------+
*/
27、查询出只有两门课程的全部学生的学号和姓名:
scoreTableDF
.groupBy("s_id")
.agg(count("c_id").as("sum"))
.where("sum = 2")
.join(studentTableDF,"s_id")
.show()
/**
+----+---+------+----------+-----+
|s_id|sum|s_name| s_birth|s_sex|
+----+---+------+----------+-----+
| 07| 2| 郑竹|1989-07-01| 女|
| 05| 2| 周梅|1991-12-01| 女|
| 06| 2| 吴兰|1992-03-01| 女|
+----+---+------+----------+-----+
*/
28、查询男生、女生人数:
studentTableDF.groupBy("s_sex").count().show()
/**
+-----+-----+
|s_sex|count|
+-----+-----+
| 男| 4|
| 女| 4|
+-----+-----+
*/
29、查询名字中含有"风"字的学生信息:
studentTableDF.where("s_name like '%风%'").show()
/**
+----+------+----------+-----+
|s_id|s_name| s_birth|s_sex|
+----+------+----------+-----+
| 03| 孙风|1990-05-20| 男|
+----+------+----------+-----+
*/
30、查询同名同性学生名单,并统计同名人数:
studentTableDF
.groupBy("s_name").count()
.where("count > 1")
.show()
/**
+------+-----+
|s_name|count|
+------+-----+
+------+-----+
*/
31、查询1990年出生的学生名单:
studentTableDF.where(col("s_birth").startsWith("1990")).show()
/**
+----+------+----------+-----+
|s_id|s_name| s_birth|s_sex|
+----+------+----------+-----+
| 01| 赵雷|1990-01-01| 男|
| 02| 钱电|1990-12-21| 男|
| 03| 孙风|1990-05-20| 男|
| 04| 李云|1990-08-06| 男|
| 08| 王菊|1990-01-20| 女|
+----+------+----------+-----+
*/
32、查询每门课程的平均成绩,结果按平均成绩降序排列,平均成绩相同时,按课程编号升序排列:
scoreTableDF
.groupBy("c_id")
.avg("s_score")
.orderBy($"avg(s_score)".desc,$"c_id")
.show()
/**
+----+-----------------+
|c_id| avg(s_score)|
+----+-----------------+
| 02|72.66666666666667|
| 03| 68.5|
| 01| 64.5|
+----+-----------------+
*/
33、查询平均成绩大于等于85的所有学生的学号、姓名和平均成绩:
scoreTableDF
.groupBy("s_id")
.agg(avg("s_score").as("avg"))
.where("avg >= 85")
.join(studentTableDF,"s_id")
.select("s_id","s_name","avg")
.show()
/**
+----+------+-----------------+
|s_id|s_name| avg|
+----+------+-----------------+
| 07| 郑竹| 93.5|
| 01| 赵雷|89.66666666666667|
+----+------+-----------------+
*/
34、查询课程名称为"数学",且分数低于60的学生姓名和分数:
scoreTableDF
.join(courseTableDF,"c_id")
.join(studentTableDF,"s_id")
.where("c_name = '数学' and s_score < 60")
.select("s_name","s_score")
.show()
/**
+------+-------+
|s_name|s_score|
+------+-------+
| 李云| 30|
+------+-------+
*/
35、查询所有学生的课程及分数情况:
studentTableDF.join(scoreTableDF,Seq("s_id"),"left").show()
/**
+----+------+----------+-----+----+-------+
|s_id|s_name| s_birth|s_sex|c_id|s_score|
+----+------+----------+-----+----+-------+
| 07| 郑竹|1989-07-01| 女| 02| 89|
| 07| 郑竹|1989-07-01| 女| 03| 98|
| 01| 赵雷|1990-01-01| 男| 01| 80|
| 01| 赵雷|1990-01-01| 男| 02| 90|
| 01| 赵雷|1990-01-01| 男| 03| 99|
| 05| 周梅|1991-12-01| 女| 01| 76|
| 05| 周梅|1991-12-01| 女| 02| 87|
| 08| 王菊|1990-01-20| 女|null| null|
| 03| 孙风|1990-05-20| 男| 01| 80|
| 03| 孙风|1990-05-20| 男| 02| 80|
| 03| 孙风|1990-05-20| 男| 03| 80|
| 02| 钱电|1990-12-21| 男| 01| 70|
| 02| 钱电|1990-12-21| 男| 02| 60|
| 02| 钱电|1990-12-21| 男| 03| 80|
| 06| 吴兰|1992-03-01| 女| 01| 31|
| 06| 吴兰|1992-03-01| 女| 03| 34|
| 04| 李云|1990-08-06| 男| 01| 50|
| 04| 李云|1990-08-06| 男| 02| 30|
| 04| 李云|1990-08-06| 男| 03| 20|
+----+------+----------+-----+----+-------+
*/
36、查询任何一门课程成绩在70分以上的学生姓名、课程名称和分数:
scoreTableDF
.where("s_score > 70")
.join(studentTableDF,"s_id")
.join(courseTableDF,"c_id")
.select("s_name","c_name","s_score")
.show()
/**
+------+------+-------+
|s_name|c_name|s_score|
+------+------+-------+
| 赵雷| 语文| 80|
| 周梅| 语文| 76|
| 孙风| 语文| 80|
| 郑竹| 英语| 98|
| 赵雷| 英语| 99|
| 孙风| 英语| 80|
| 钱电| 英语| 80|
| 郑竹| 数学| 89|
| 赵雷| 数学| 90|
| 周梅| 数学| 87|
| 孙风| 数学| 80|
+------+------+-------+
*/
37、查询课程不及格的学生:
scoreTableDF
.where("s_score < 60")
.join(studentTableDF,"s_id")
.show()
/**
+----+----+-------+------+----------+-----+
|s_id|c_id|s_score|s_name| s_birth|s_sex|
+----+----+-------+------+----------+-----+
| 06| 01| 31| 吴兰|1992-03-01| 女|
| 06| 03| 34| 吴兰|1992-03-01| 女|
| 04| 01| 50| 李云|1990-08-06| 男|
| 04| 02| 30| 李云|1990-08-06| 男|
| 04| 03| 20| 李云|1990-08-06| 男|
+----+----+-------+------+----------+-----+
*/
38、查询课程编号为01且课程成绩在80分以上的学生的学号和姓名:
scoreTableDF
.where("c_id = 01 and s_score >= 80")
.join(studentTableDF,"s_id")
.select("s_id","s_name")
.show()
/**
+----+------+
|s_id|s_name|
+----+------+
| 01| 赵雷|
| 03| 孙风|
+----+------+
*/
39、求每门课程的学生人数:
scoreTableDF.groupBy("c_id").count().show()
/**
+----+-----+
|c_id|count|
+----+-----+
| 01| 6|
| 03| 6|
| 02| 6|
+----+-----+
*/
40、查询选修"张三"老师所授课程的学生中,成绩最高的学生信息及其成绩:
studentTableDF
.join(scoreTableDF,"s_id")
.join(courseTableDF,"c_id")
.join(teacherTableDF.where("t_name = '张三'"),"t_id")
.groupBy("c_id")
.max("s_score")
.show()
/**
+----+------------+
|c_id|max(s_score)|
+----+------------+
| 02| 90|
+----+------------+
*/
41、查询不同课程成绩相同的学生的学生编号、课程编号、学生成绩:
scoreTableDF.as("s1")
.withColumnRenamed("c_id","s1_id")
.withColumnRenamed("s_score","s1_score")
.join(scoreTableDF.as("s2"),"s_id")
.withColumnRenamed("c_id","s2_id")
.withColumnRenamed("s_score","s2_score")
.where("s1_score = s2_score and s1_id != s2_id")
.select("s_id","s1_id","s2_id","s1_score")
.withColumnRenamed("s1_score","score")
.show()
/**
+----+-----+-----+-----+
|s_id|s1_id|s2_id|score|
+----+-----+-----+-----+
| 03| 01| 02| 80|
| 03| 01| 03| 80|
| 03| 02| 01| 80|
| 03| 02| 03| 80|
| 03| 03| 01| 80|
| 03| 03| 02| 80|
+----+-----+-----+-----+
*/
42、查询每门课程成绩最好的前三名:
scoreTableDF
.selectExpr(
"*","Row_Number() over(partition by c_id order by s_score desc) rank"
).where("rank between 1 and 3")
.show()
/**
+----+----+-------+----+
|s_id|c_id|s_score|rank|
+----+----+-------+----+
| 01| 01| 80| 1|
| 03| 01| 80| 2|
| 05| 01| 76| 3|
| 01| 03| 99| 1|
| 07| 03| 98| 2|
| 02| 03| 80| 3|
| 01| 02| 90| 1|
| 07| 02| 89| 2|
| 05| 02| 87| 3|
+----+----+-------+----+
*/
43、统计每门课程的学生选修人数(超过5人的课程才统计)要求输出课程号和选修人数,查询结果按人数降序排列,若人数相同,按课程号升序排列
scoreTableDF
.groupBy("c_id")
.agg(count("s_id").as("count"))
.where("count > 5")
.orderBy($"count".desc,$"c_id")
.select("c_id","count")
.show()
/**
+----+-----+
|c_id|count|
+----+-----+
| 01| 6|
| 02| 6|
| 03| 6|
+----+-----+
*/
44、检索至少选修两门课程的学生学号:
scoreTableDF
.groupBy("s_id")
.agg(count("c_id").as("count"))
.where("count >= 2")
.select("s_id")
.show()
/**
+----+
|s_id|
+----+
| 07|
| 01|
| 05|
| 03|
| 02|
| 06|
| 04|
+----+
*/
45、查询选修了全部课程的学生信息:
scoreTableDF
.groupBy("s_id")
.agg(count("c_id").as("count"))
.where(s"count = ${courseTableDF.select("c_id").count()}")
.join(studentTableDF,"s_id")
.show()
/**
+----+-----+------+----------+-----+
|s_id|count|s_name| s_birth|s_sex|
+----+-----+------+----------+-----+
| 01| 3| 赵雷|1990-01-01| 男|
| 03| 3| 孙风|1990-05-20| 男|
| 02| 3| 钱电|1990-12-21| 男|
| 04| 3| 李云|1990-08-06| 男|
+----+-----+------+----------+-----+
*/
46、查询各学生的年龄(周岁):
//-- 按照出生日期来算,当前月日 < 出生年月的月日则,年龄减一
def getNowTime():String = {
val time = new Date().getTime
val format = new SimpleDateFormat("yyyy-MM-dd")
format.format(time)
}
studentTableDF.map(
x=>{
val age = getNowTime().substring(0,4).toInt-x.get(2).toString.substring(0,4).toInt
if (getNowTime().substring(5,7).toInt > x.get(2).toString.substring(5,7).toInt) (x.get(0).toString,age)
else if (getNowTime().substring(5,7).toInt < x.get(2).toString.substring(5,7).toInt) (x.get(0).toString,age-1)
else {
if (getNowTime().substring(8,10).toInt >= x.get(2).toString.substring(8,10).toInt)(x.get(0).toString,age)
else (x.get(0).toString,age-1)
}
}
)
.toDF("s_id","age")
.join(studentTableDF,"s_id")
.show()
/**
+----+---+------+----------+-----+
|s_id|age|s_name| s_birth|s_sex|
+----+---+------+----------+-----+
| 07| 31| 郑竹|1989-07-01| 女|
| 01| 30| 赵雷|1990-01-01| 男|
| 05| 28| 周梅|1991-12-01| 女|
| 08| 30| 王菊|1990-01-20| 女|
| 03| 30| 孙风|1990-05-20| 男|
| 02| 29| 钱电|1990-12-21| 男|
| 06| 28| 吴兰|1992-03-01| 女|
| 04| 30| 李云|1990-08-06| 男|
+----+---+------+----------+-----+
*/
47、查询本周过生日的学生:找到下周一-1即为本周最后一天,开始时间为当前天(若今天就是星期天会不会出错?)
// where内语句为一条语句,我这边为了更好展示所以进行了折行处理
studentTableDF
.where(
"unix_timestamp(" +
"cast(concat_ws('-',date_format(" +
"current_date(),'yyyy'),date_format(s_birth,'MM')," +
"date_format(s_birth,'dd')) as date),'yyyy-MM-dd') " +
"between unix_timestamp(next_day(current_date(),'MON'),'yyyy-MM-dd') " +
"and unix_timestamp(date_add(next_day(current_date(),'MON'),6),'yyyy-MM-dd')")
.show()
/**
+----+------+----------+-----+
|s_id|s_name| s_birth|s_sex|
+----+------+----------+-----+
| 05| 周梅|1991-12-01| 女|
+----+------+----------+-----+
*/
48、查询下周过生日的学生: 下周1到+6天
studentTableDF
.where(
"unix_timestamp(" +
"cast(concat_ws('-',date_format(" +
"current_date(),'yyyy'),date_format(s_birth,'MM')," +
"date_format(s_birth,'dd')) as date),'yyyy-MM-dd') " +
"between unix_timestamp(next_day(current_date(),'MON'),'yyyy-MM-dd')" +
"and unix_timestamp(date_add(next_day(current_date(),'MON'),6),'yyyy-MM-dd')")
.show()
/**
+----+------+----------+-----+
|s_id|s_name| s_birth|s_sex|
+----+------+----------+-----+
| 05| 周梅|1991-12-01| 女|
+----+------+----------+-----+
*/
49、查询本月过生日的学生:
println("-49-")
studentTableDF
.where("month(s_birth) = month(current_date())")
.show()
/**
+----+------+-------+-----+
|s_id|s_name|s_birth|s_sex|
+----+------+-------+-----+
+----+------+-------+-----+
*/
50、查询12月份过生日的学生:
println("-50-")
studentTableDF
.where("month(s_birth) = 12")
.show()
/**
+----+------+----------+-----+
|s_id|s_name| s_birth|s_sex|
+----+------+----------+-----+
| 02| 钱电|1990-12-21| 男|
| 05| 周梅|1991-12-01| 女|
+----+------+----------+-----+
*/