作者行业新人,如果有不对的地方,希望可以指出,共同学习。

确定表名和字段

  • 1.学生表
  • Student(s_id,s_name,s_birth,s_sex) --学生编号,学生姓名, 出生年月,学生性别
  • 2.课程表
  • Course(c_id,c_name,t_id) – --课程编号, 课程名称, 教师编号
  • 3.教师表
  • Teacher(t_id,t_name) --教师编号,教师姓名
  • 4.成绩表
  • Score(s_id,c_id,s_score) --学生编号,课程编号,分数

建表和插入数据

--建表
--学生表
CREATE TABLE `Student`(
	`s_id` VARCHAR(20),
	`s_name` VARCHAR(20) NOT NULL DEFAULT '',
	`s_birth` VARCHAR(20) NOT NULL DEFAULT '',
	`s_sex` VARCHAR(10) NOT NULL DEFAULT '',
	PRIMARY KEY(`s_id`)
);
--课程表
CREATE TABLE `Course`(
	`c_id`  VARCHAR(20),
	`c_name` VARCHAR(20) NOT NULL DEFAULT '',
	`t_id` VARCHAR(20) NOT NULL,
	PRIMARY KEY(`c_id`)
);
--教师表
CREATE TABLE `Teacher`(
	`t_id` VARCHAR(20),
	`t_name` VARCHAR(20) NOT NULL DEFAULT '',
	PRIMARY KEY(`t_id`)
);
--成绩表
CREATE TABLE `Score`(
	`s_id` VARCHAR(20),
	`c_id`  VARCHAR(20),
	`s_score` INT(3),
	PRIMARY KEY(`s_id`,`c_id`)
);
--插入学生表测试数据
insert into Student values('01' , '赵雷' , '1990-01-01' , '男');
insert into Student values('02' , '钱电' , '1990-12-21' , '男');
insert into Student values('03' , '孙风' , '1990-05-20' , '男');
insert into Student values('04' , '李云' , '1990-08-06' , '男');
insert into Student values('05' , '周梅' , '1991-12-01' , '女');
insert into Student values('06' , '吴兰' , '1992-03-01' , '女');
insert into Student values('07' , '郑竹' , '1989-07-01' , '女');
insert into Student values('08' , '王菊' , '1990-01-20' , '女');
--课程表测试数据
insert into Course values('01' , '语文' , '02');
insert into Course values('02' , '数学' , '01');
insert into Course values('03' , '英语' , '03');

--教师表测试数据
insert into Teacher values('01' , '张三');
insert into Teacher values('02' , '李四');
insert into Teacher values('03' , '王五');

--成绩表测试数据
insert into Score values('01' , '01' , 80);
insert into Score values('01' , '02' , 90);
insert into Score values('01' , '03' , 99);
insert into Score values('02' , '01' , 70);
insert into Score values('02' , '02' , 60);
insert into Score values('02' , '03' , 80);
insert into Score values('03' , '01' , 80);
insert into Score values('03' , '02' , 80);
insert into Score values('03' , '03' , 80);
insert into Score values('04' , '01' , 50);
insert into Score values('04' , '02' , 30);
insert into Score values('04' , '03' , 20);
insert into Score values('05' , '01' , 76);
insert into Score values('05' , '02' , 87);
insert into Score values('06' , '01' , 31);
insert into Score values('06' , '03' , 34);
insert into Score values('07' , '02' , 89);
insert into Score values('07' , '03' , 98);

查看表

sparksql修改schema_spark

使用jdbc连接上数据库

val spark = SparkSession.builder().appName("mysql50") .master("local[*]").getOrCreate()
    import spark.implicits._
    import org.apache.spark.sql.functions._

    val url = "jdbc:mysql://192.168.87.100:3306/school"
    val user = "root"
    val pwd = "ok"
    val driver = "com.mysql.jdbc.Driver"
    val prop = new Properties()
    
    prop.setProperty("user",user)
    prop.setProperty("password",pwd)
    prop.setProperty("driver",driver)

    val courseTableDF: DataFrame = spark.read.jdbc(url,"Course",prop)
    val scoreTableDF: DataFrame = spark.read.jdbc(url,"Score",prop)
    val studentTableDF: DataFrame = spark.read.jdbc(url,"Student",prop)
    val teacherTableDF: DataFrame = spark.read.jdbc(url,"Teacher",prop)

1、查询"01"课程比"02"课程成绩高的学生的信息及课程分数

studentTableDF
      .join(scoreTableDF.as("s1"), "s_id")
      .where("s1.c_id = 01")
      .join(scoreTableDF.as("s2"), "s_id")
      .where("s2.c_id = 02 and s1.s_score > s2.s_score")
      .show()
/**
+----+------+----------+-----+----+-------+----+-------+
|s_id|s_name|   s_birth|s_sex|c_id|s_score|c_id|s_score|
+----+------+----------+-----+----+-------+----+-------+
|  02|    钱电|1990-12-21|    男|  01|     70|  02|     60|
|  04|    李云|1990-08-06|    男|  01|     50|  02|     30|
+----+------+----------+-----+----+-------+----+-------+
*/

2、查询"01"课程比"02"课程成绩低的学生的信息及课程分数

studentTableDF
      .join(scoreTableDF.as("s1"),"s_id")
      .join(scoreTableDF.as("s2"),"s_id")
      .where("s1.c_id = 01 and s2.c_id = 02 and s1.s_score < s2.s_score")
      .show()

/**
+----+------+----------+-----+----+-------+----+-------+
|s_id|s_name|   s_birth|s_sex|c_id|s_score|c_id|s_score|
+----+------+----------+-----+----+-------+----+-------+
|  01|    赵雷|1990-01-01|    男|  01|     80|  02|     90|
|  05|    周梅|1991-12-01|    女|  01|     76|  02|     87|
+----+------+----------+-----+----+-------+----+-------+
*/

3、查询平均成绩大于等于60分的同学的学生编号和学生姓名和平均成绩

scoreTableDF.as("sc")
      .groupBy("s_id")
      .agg(avg("s_score").as("avgScore"))
      .where(col("avgScore") > 60)
      .join(studentTableDF, "s_id")
      .select("s_id","s_name","avgScore")
      .show()

/**
+----+------+-----------------+
|s_id|s_name|         avgScore|
+----+------+-----------------+
|  07|    郑竹|             93.5|
|  01|    赵雷|89.66666666666667|
|  05|    周梅|             81.5|
|  03|    孙风|             80.0|
|  02|    钱电|             70.0|
+----+------+-----------------+
*/

4、查询平均成绩小于60分的同学的学生编号和学生姓名和平均成绩(包括有成绩的和无成绩的)

studentTableDF
      .join(
        (scoreTableDF.groupBy("s_id")
          .avg("s_score")).as("sc"), Seq("s_id"), "left_outer")
      .as("s")
      .withColumnRenamed("avg(s_score)", "avgScore")
      .where((col("avgScore") < 60 || col("avgScore").isNull))
      .select("s_id","s_name","avgScore")
      .show()

/**
+----+------+------------------+
|s_id|s_name|          avgScore|
+----+------+------------------+
|  08|    王菊|              null|
|  06|    吴兰|              32.5|
|  04|    李云|33.333333333333336|
+----+------+------------------+
*/

5、查询所有同学的学生编号、学生姓名、选课总数、所有课程的总成绩

scoreTableDF
      .groupBy("s_id")
      .agg(count("c_id").as("CountCourse"),
        sum("s_score").as("SumScore"))
      .join(studentTableDF, Seq("s_id"), "right_outer")
      .select("s_id","s_name","CountCourse","SumScore")
      .show()

/**
+----+------+-----------+--------+
|s_id|s_name|CountCourse|SumScore|
+----+------+-----------+--------+
|  07|    郑竹|          2|     187|
|  01|    赵雷|          3|     269|
|  05|    周梅|          2|     163|
|  08|    王菊|       null|    null|
|  03|    孙风|          3|     240|
|  02|    钱电|          3|     210|
|  06|    吴兰|          2|      65|
|  04|    李云|          3|     100|
+----+------+-----------+--------+
*/

6、查询"李"姓老师的数量

teacherTableDF
      .where(col("t_name").startsWith("李"))
      .agg(count("t_id"))
      .show()

/**
+-----------+
|count(t_id)|
+-----------+
|          1|
+-----------+
*/

7、查询学过"张三"老师授课的同学的信息

studentTableDF
      .join(scoreTableDF, "s_id")
      .join(courseTableDF, "c_id")
      .join(teacherTableDF, "t_id")
      .where("t_name = '张三'")
      .select("s_id","s_name","s_birth","s_sex")
      .show()

/**
+----+------+----------+-----+
|s_id|s_name|   s_birth|s_sex|
+----+------+----------+-----+
|  07|    郑竹|1989-07-01|    女|
|  01|    赵雷|1990-01-01|    男|
|  05|    周梅|1991-12-01|    女|
|  03|    孙风|1990-05-20|    男|
|  02|    钱电|1990-12-21|    男|
|  04|    李云|1990-08-06|    男|
+----+------+----------+-----+
*/

8、查询没学过"张三"老师授课的同学的信息

studentTableDF
      .join(scoreTableDF
        .join(courseTableDF, "c_id")
      .join(teacherTableDF, "t_id"),Seq("s_id"),"left_outer")
      .where("t_name != '张三' or t_name is NULL")
      .select("s_id","s_name","s_birth","s_sex")
      .distinct()
      .show()

/**
+----+------+----------+-----+
|s_id|s_name|   s_birth|s_sex|
+----+------+----------+-----+
|  07|    郑竹|1989-07-01|    女|
|  01|    赵雷|1990-01-01|    男|
|  05|    周梅|1991-12-01|    女|
|  08|    王菊|1990-01-20|    女|
|  03|    孙风|1990-05-20|    男|
|  02|    钱电|1990-12-21|    男|
|  06|    吴兰|1992-03-01|    女|
|  04|    李云|1990-08-06|    男|
+----+------+----------+-----+
*/

9、查询学过编号为"01"并且也学过编号为"02"的课程的同学的信息:

studentTableDF
      .join(scoreTableDF.as("s1"), "s_id")
      .join(scoreTableDF.as("s2"), "s_id")
      .where("s1.c_id = 01 and s2.c_id = 02")
      .select("s_id","s_name","s_birth","s_sex")
      .show()

/**
+----+------+----------+-----+
|s_id|s_name|   s_birth|s_sex|
+----+------+----------+-----+
|  01|    赵雷|1990-01-01|    男|
|  05|    周梅|1991-12-01|    女|
|  03|    孙风|1990-05-20|    男|
|  02|    钱电|1990-12-21|    男|
|  04|    李云|1990-08-06|    男|
+----+------+----------+-----+
*/

10、查询学过编号为"01"但是没有学过编号为"02"的课程的同学的信息

studentTableDF
      .join(scoreTableDF.where("c_id = 02"), Seq("s_id"), "left_outer")
      .where("c_id is null").as("s1")
      .join(scoreTableDF.as("sc2"), "s_id")
      .where("sc2.c_id = 01")
      .select("s_id","s_name","s_birth","s_sex")
      .show()

/**
+----+------+----------+-----+
|s_id|s_name|   s_birth|s_sex|
+----+------+----------+-----+
|  06|    吴兰|1992-03-01|    女|
+----+------+----------+-----+
*/

11、查询没有学全所有课程的同学的信息

studentTableDF
      .join(scoreTableDF.as("s"),Seq("s_id"),"left_outer")
      .groupBy("s_id")
      .agg(count("s.c_id").as("stuSubNum"))
      .where("stuSubNum < (select count(c_id) from course)")
      .show()

/**
+----+---------+
|s_id|stuSubNum|
+----+---------+
|  07|        2|
|  05|        2|
|  08|        0|
|  06|        2|
+----+---------+
*/

12、查询至少有一门课与学号为"01"的同学所学相同的同学的信息:

studentTableDF
      .join(scoreTableDF,"s_id").as("s1")
      .join(scoreTableDF
        .where("s_id = 01").as("s2"),"c_id")
      .select("s1.s_id").distinct()
      .where("s_id != 1")
      .join(studentTableDF,"s_id")
      .show()

/**
+----+------+----------+-----+
|s_id|s_name|   s_birth|s_sex|
+----+------+----------+-----+
|  07|    郑竹|1989-07-01|    女|
|  05|    周梅|1991-12-01|    女|
|  03|    孙风|1990-05-20|    男|
|  02|    钱电|1990-12-21|    男|
|  06|    吴兰|1992-03-01|    女|
|  04|    李云|1990-08-06|    男|
+----+------+----------+-----+
*/

13、查询和"01"号的同学学习的课程完全相同的其他同学的信息:

studentTableDF
      .join(scoreTableDF,"s_id").as("s1")
      .join(scoreTableDF.where("s_id = 1").as("s2"),
        "c_id")
      .groupBy("s1.s_id")
      .count.where(s"count = ${scoreTableDF.where("s_id = 1").count} and s1.s_id != 1")
      .join(studentTableDF,"s_id")
      .show()

/**
+----+-----+------+----------+-----+
|s_id|count|s_name|   s_birth|s_sex|
+----+-----+------+----------+-----+
|  03|    3|    孙风|1990-05-20|    男|
|  02|    3|    钱电|1990-12-21|    男|
|  04|    3|    李云|1990-08-06|    男|
+----+-----+------+----------+-----+
*/

14、查询没学过"张三"老师讲授的任一门课程的学生姓名:

studentTableDF
      .join(scoreTableDF,"s_id")
      .join(courseTableDF,"c_id")
      .join(teacherTableDF
        .where("t_name = '张三'"),"t_id")
      .as("s1").select("s_id")
      .join(studentTableDF,Seq("s_id"),"right_outer")
      .where("s1.s_id is NULL")
      .show()

/**
+----+------+----------+-----+
|s_id|s_name|   s_birth|s_sex|
+----+------+----------+-----+
|  08|    王菊|1990-01-20|    女|
|  06|    吴兰|1992-03-01|    女|
+----+------+----------+-----+
*/

15、查询两门及其以上不及格课程的同学的学号,姓名及其平均成绩:

scoreTableDF
      .where("s_score < 60")
      .groupBy("s_id").count()
      .where("count >= 2")
      .join(scoreTableDF,"s_id")
      .groupBy("s_id")
      .avg("s_score").join(studentTableDF,"s_id")
      .select("s_id","s_name","avg(s_score)")
      .show()

/**
+----+------+------------------+
|s_id|s_name|      avg(s_score)|
+----+------+------------------+
|  06|    吴兰|              32.5|
|  04|    李云|33.333333333333336|
+----+------+------------------+
*/

16、检索"01"课程分数小于60,按分数降序排列的学生信息:

scoreTableDF
      .where("c_id = 01")
      .join(studentTableDF,Seq("s_id"),"right_outer")
      .where("s_score < 60 or s_score is null")
      .orderBy($"s_score".desc)
      .show()

/**
+----+----+-------+------+----------+-----+
|s_id|c_id|s_score|s_name|   s_birth|s_sex|
+----+----+-------+------+----------+-----+
|  04|  01|     50|    李云|1990-08-06|    男|
|  06|  01|     31|    吴兰|1992-03-01|    女|
|  08|null|   null|    王菊|1990-01-20|    女|
|  07|null|   null|    郑竹|1989-07-01|    女|
+----+----+-------+------+----------+-----+
*/

17、按平均成绩从高到低显示所有学生的所有课程的成绩以及平均成绩:

scoreTableDF
      .groupBy("s_id")
      .avg("s_score")
      .join(scoreTableDF,"s_id")
      .join(studentTableDF,"s_id")
      .orderBy($"avg(s_score)".desc)
      .show()

/**
+----+------------------+----+-------+------+----------+-----+
|s_id|      avg(s_score)|c_id|s_score|s_name|   s_birth|s_sex|
+----+------------------+----+-------+------+----------+-----+
|  07|              93.5|  03|     98|    郑竹|1989-07-01|    女|
|  07|              93.5|  02|     89|    郑竹|1989-07-01|    女|
|  01| 89.66666666666667|  03|     99|    赵雷|1990-01-01|    男|
|  01| 89.66666666666667|  01|     80|    赵雷|1990-01-01|    男|
|  01| 89.66666666666667|  02|     90|    赵雷|1990-01-01|    男|
|  05|              81.5|  01|     76|    周梅|1991-12-01|    女|
|  05|              81.5|  02|     87|    周梅|1991-12-01|    女|
|  03|              80.0|  02|     80|    孙风|1990-05-20|    男|
|  03|              80.0|  01|     80|    孙风|1990-05-20|    男|
|  03|              80.0|  03|     80|    孙风|1990-05-20|    男|
|  02|              70.0|  01|     70|    钱电|1990-12-21|    男|
|  02|              70.0|  02|     60|    钱电|1990-12-21|    男|
|  02|              70.0|  03|     80|    钱电|1990-12-21|    男|
|  04|33.333333333333336|  03|     20|    李云|1990-08-06|    男|
|  04|33.333333333333336|  01|     50|    李云|1990-08-06|    男|
|  04|33.333333333333336|  02|     30|    李云|1990-08-06|    男|
|  06|              32.5|  01|     31|    吴兰|1992-03-01|    女|
|  06|              32.5|  03|     34|    吴兰|1992-03-01|    女|
+----+------------------+----+-------+------+----------+-----+
*/

18、查询各科成绩最高分、最低分和平均分:以如下形式显示:课程ID,课程name,最高分,最低分,平均分,及格率,中等率,优良率,优秀率:

val jige = scoreDF.rdd.map(x=>{if(x.getAs("score").toString.toInt > 60) (x(1).toString,1) else (x(1).toString,0)}).reduceByKey(_+_).toDF("course_id","jige")
	val zhongdeng = scoreDF.rdd.map(x=>{if(x.getAs("score").toString.toInt > 70) (x(1).toString,1) else (x(1).toString,0)}).reduceByKey(_+_).toDF("course_id","zhongdeng")
	val youliang = scoreDF.rdd.map(x=>{if(x.getAs("score").toString.toInt > 80) (x(1).toString,1) else (x(1).toString,0)}).reduceByKey(_+_).toDF("course_id","youliang")
	val youxiu = scoreDF.rdd.map(x=>{if(x.getAs("score").toString.toInt > 90) (x(1).toString,1) else (x(1).toString,0)}).reduceByKey(_+_).toDF("course_id","youxiu")
	
	val s1 = scoreDF.groupBy("course_id").agg("score"->"max","score"->"min","score"->"avg","score"->"count")
	
	s1.join(jige,"course_id")
	  .join(zhongdeng,"course_id")
	  .join(youliang,"course_id")
	  .join(youxiu,"course_id")
	  .withColumn("jgl",$"jige"/$"count(score)")
	  .withColumn("zdl",$"zhongdeng"/$"count(score)")
  	  .withColumn("yll",$"youliang"/$"count(score)")
	  .withColumn("yxl",$"youxiu"/$"count(score)")
	  .drop("jige","zhongdeng","youliang","youxiu")
	  .show

/**
+---------+----------+----------+-----------------+------------+------------------+------------------+------------------+------------------+
|course_id|max(score)|min(score)|       avg(score)|count(score)|               jgl|               zdl|               yll|               yxl|
+---------+----------+----------+-----------------+------------+------------------+------------------+------------------+------------------+
|        1|        80|        31|             64.5|           6|0.6666666666666666|               0.5|               0.0|               0.0|
|        3|        99|        20|             68.5|           6|0.6666666666666666|0.6666666666666666|0.3333333333333333|0.3333333333333333|
|        2|        90|        30|72.66666666666667|           6|0.6666666666666666|0.6666666666666666|               0.5|               0.0|
+---------+----------+----------+-----------------+------------+------------------+------------------+------------------+------------------+
*/

19、按各科成绩进行排序,并显示排名:

scoreTableDF.selectExpr(
      "*","Row_Number() over(partition by c_id order by s_score desc) rank"
    ).show()

/**
+----+----+-------+----+
|s_id|c_id|s_score|rank|
+----+----+-------+----+
|  01|  01|     80|   1|
|  03|  01|     80|   2|
|  05|  01|     76|   3|
|  02|  01|     70|   4|
|  04|  01|     50|   5|
|  06|  01|     31|   6|
|  01|  03|     99|   1|
|  07|  03|     98|   2|
|  02|  03|     80|   3|
|  03|  03|     80|   4|
|  06|  03|     34|   5|
|  04|  03|     20|   6|
|  01|  02|     90|   1|
|  07|  02|     89|   2|
|  05|  02|     87|   3|
|  03|  02|     80|   4|
|  02|  02|     60|   5|
|  04|  02|     30|   6|
+----+----+-------+----+
*/

20、查询学生的总成绩并进行排名:

scoreTableDF
      .groupBy("s_id")
      .agg(sum("s_score").as("sumScore"))
      .selectExpr("*","Row_Number() over(order by sumScore desc) rank")
      .show()

/**
+----+--------+----+
|s_id|sumScore|rank|
+----+--------+----+
|  01|     269|   1|
|  03|     240|   2|
|  02|     210|   3|
|  07|     187|   4|
|  05|     163|   5|
|  04|     100|   6|
|  06|      65|   7|
+----+--------+----+
*/

21、查询不同老师所教不同课程平均分从高到低显示:

scoreTableDF.join(
      courseTableDF.join(teacherTableDF,"t_id")
      ,"c_id").groupBy("t_id","c_id")
      .agg(avg("s_score").as("avgScore"))
      .orderBy($"avgScore".desc)
      .show()

/**
+----+----+-----------------+
|t_id|c_id|         avgScore|
+----+----+-----------------+
|  01|  02|72.66666666666667|
|  03|  03|             68.5|
|  02|  01|             64.5|
+----+----+-----------------+
*/

22、查询所有课程的成绩第2 名到第3 名的学生信息及该课程成绩:

studentTableDF
      .join(scoreTableDF,"s_id")
      .selectExpr(
        "*","Row_Number() over(partition by c_id order by s_score desc) rank"
      )
      .where("rank between 2 and 3")
      .show()

/**
+----+------+----------+-----+----+-------+----+
|s_id|s_name|   s_birth|s_sex|c_id|s_score|rank|
+----+------+----------+-----+----+-------+----+
|  03|    孙风|1990-05-20|    男|  01|     80|   2|
|  05|    周梅|1991-12-01|    女|  01|     76|   3|
|  07|    郑竹|1989-07-01|    女|  03|     98|   2|
|  03|    孙风|1990-05-20|    男|  03|     80|   3|
|  07|    郑竹|1989-07-01|    女|  02|     89|   2|
|  05|    周梅|1991-12-01|    女|  02|     87|   3|
+----+------+----------+-----+----+-------+----+
*/

23、统计各科成绩各分数段人数:课程编号,课程名称,[100-85],[85-70],[70-60],[0-60]及所占百分比

scoreTableDF.groupBy("c_id")
      .agg(count("s_id")
      .as("zong"))
      .join(
      scoreTableDF.where("s_score < 60")
      .groupBy("c_id")
      .agg(count("s_id")
        .as("bujige")),Seq("c_id"),"left")
      .join(
      scoreTableDF.where("60 <= s_score and s_score < 70")
        .groupBy("c_id")
        .agg(count("s_id")
          .as("jige")),Seq("c_id"),"left")
      .join(
      scoreTableDF.where("70 <= s_score and s_score < 85")
        .groupBy("c_id")
        .agg(count("s_id")
          .as("liang")),Seq("c_id"),"left")
      .join(
      scoreTableDF.where("85 <= s_score and s_score <= 100")
        .groupBy("c_id")
        .agg(count("s_id")
          .as("you")),Seq("c_id"),"left")
      .withColumn("bjgl",col("bujige")/col("zong"))
      .withColumn("jgl",col("jige")/col("zong"))
      .withColumn("ll",col("liang")/col("zong"))
      .withColumn("yl",col("you")/col("zong"))
      .show()

/**
+----+----+------+----+-----+----+-------------------+-------------------+-------------------+------------------+
|c_id|zong|bujige|jige|liang| you|               bjgl|                jgl|                 ll|                yl|
+----+----+------+----+-----+----+-------------------+-------------------+-------------------+------------------+
|  01|   6|     2|null|    4|null| 0.3333333333333333|               null| 0.6666666666666666|              null|
|  03|   6|     2|null|    2|   2| 0.3333333333333333|               null| 0.3333333333333333|0.3333333333333333|
|  02|   6|     1|   1|    1|   3|0.16666666666666666|0.16666666666666666|0.16666666666666666|               0.5|
+----+----+------+----+-----+----+-------------------+-------------------+-------------------+------------------+
*/

24、查询学生平均成绩及其名次:

scoreTableDF
      .groupBy("s_id")
      .agg(avg("s_score").as("avgScore"))
      .selectExpr("*","Row_Number() over(order by avgScore desc) rank")
      .join(studentTableDF,"s_id")
      .show()

/**
+----+------------------+----+------+----------+-----+
|s_id|          avgScore|rank|s_name|   s_birth|s_sex|
+----+------------------+----+------+----------+-----+
|  07|              93.5|   1|    郑竹|1989-07-01|    女|
|  01| 89.66666666666667|   2|    赵雷|1990-01-01|    男|
|  05|              81.5|   3|    周梅|1991-12-01|    女|
|  03|              80.0|   4|    孙风|1990-05-20|    男|
|  02|              70.0|   5|    钱电|1990-12-21|    男|
|  06|              32.5|   7|    吴兰|1992-03-01|    女|
|  04|33.333333333333336|   6|    李云|1990-08-06|    男|
+----+------------------+----+------+----------+-----+
*/

25、查询各科成绩前三名的记录

scoreTableDF
      .selectExpr("*","Row_Number() over(partition by c_id order by s_score desc) rank")
      .where("rank between 1 and 3")
      .show()

/**
+----+----+-------+----+
|s_id|c_id|s_score|rank|
+----+----+-------+----+
|  01|  01|     80|   1|
|  03|  01|     80|   2|
|  05|  01|     76|   3|
|  01|  03|     99|   1|
|  07|  03|     98|   2|
|  02|  03|     80|   3|
|  01|  02|     90|   1|
|  07|  02|     89|   2|
|  05|  02|     87|   3|
+----+----+-------+----+
*/

26、查询每门课程被选修的学生数:

scoreTableDF.groupBy("c_id").agg(count("s_id")).show()

/**
+----+-----------+
|c_id|count(s_id)|
+----+-----------+
|  01|          6|
|  03|          6|
|  02|          6|
+----+-----------+
*/

27、查询出只有两门课程的全部学生的学号和姓名:

scoreTableDF
      .groupBy("s_id")
      .agg(count("c_id").as("sum"))
      .where("sum = 2")
      .join(studentTableDF,"s_id")
      .show()

/**
+----+---+------+----------+-----+
|s_id|sum|s_name|   s_birth|s_sex|
+----+---+------+----------+-----+
|  07|  2|    郑竹|1989-07-01|    女|
|  05|  2|    周梅|1991-12-01|    女|
|  06|  2|    吴兰|1992-03-01|    女|
+----+---+------+----------+-----+
*/

28、查询男生、女生人数:

studentTableDF.groupBy("s_sex").count().show()

/**
+-----+-----+
|s_sex|count|
+-----+-----+
|    男|    4|
|    女|    4|
+-----+-----+
*/

29、查询名字中含有"风"字的学生信息:

studentTableDF.where("s_name like '%风%'").show()

/**
+----+------+----------+-----+
|s_id|s_name|   s_birth|s_sex|
+----+------+----------+-----+
|  03|    孙风|1990-05-20|    男|
+----+------+----------+-----+
*/

30、查询同名同性学生名单,并统计同名人数:

studentTableDF
      .groupBy("s_name").count()
      .where("count > 1")
      .show()

/**
+------+-----+
|s_name|count|
+------+-----+
+------+-----+
*/

31、查询1990年出生的学生名单:

studentTableDF.where(col("s_birth").startsWith("1990")).show()

/**
+----+------+----------+-----+
|s_id|s_name|   s_birth|s_sex|
+----+------+----------+-----+
|  01|    赵雷|1990-01-01|    男|
|  02|    钱电|1990-12-21|    男|
|  03|    孙风|1990-05-20|    男|
|  04|    李云|1990-08-06|    男|
|  08|    王菊|1990-01-20|    女|
+----+------+----------+-----+
*/

32、查询每门课程的平均成绩,结果按平均成绩降序排列,平均成绩相同时,按课程编号升序排列:

scoreTableDF
      .groupBy("c_id")
      .avg("s_score")
      .orderBy($"avg(s_score)".desc,$"c_id")
      .show()

/**
+----+-----------------+
|c_id|     avg(s_score)|
+----+-----------------+
|  02|72.66666666666667|
|  03|             68.5|
|  01|             64.5|
+----+-----------------+
*/

33、查询平均成绩大于等于85的所有学生的学号、姓名和平均成绩:

scoreTableDF
      .groupBy("s_id")
      .agg(avg("s_score").as("avg"))
      .where("avg >= 85")
      .join(studentTableDF,"s_id")
      .select("s_id","s_name","avg")
      .show()

/**
+----+------+-----------------+
|s_id|s_name|              avg|
+----+------+-----------------+
|  07|    郑竹|             93.5|
|  01|    赵雷|89.66666666666667|
+----+------+-----------------+
*/

34、查询课程名称为"数学",且分数低于60的学生姓名和分数:

scoreTableDF
      .join(courseTableDF,"c_id")
      .join(studentTableDF,"s_id")
      .where("c_name = '数学' and s_score < 60")
      .select("s_name","s_score")
      .show()

/**
+------+-------+
|s_name|s_score|
+------+-------+
|    李云|     30|
+------+-------+
*/

35、查询所有学生的课程及分数情况:

studentTableDF.join(scoreTableDF,Seq("s_id"),"left").show()

/**
+----+------+----------+-----+----+-------+
|s_id|s_name|   s_birth|s_sex|c_id|s_score|
+----+------+----------+-----+----+-------+
|  07|    郑竹|1989-07-01|    女|  02|     89|
|  07|    郑竹|1989-07-01|    女|  03|     98|
|  01|    赵雷|1990-01-01|    男|  01|     80|
|  01|    赵雷|1990-01-01|    男|  02|     90|
|  01|    赵雷|1990-01-01|    男|  03|     99|
|  05|    周梅|1991-12-01|    女|  01|     76|
|  05|    周梅|1991-12-01|    女|  02|     87|
|  08|    王菊|1990-01-20|    女|null|   null|
|  03|    孙风|1990-05-20|    男|  01|     80|
|  03|    孙风|1990-05-20|    男|  02|     80|
|  03|    孙风|1990-05-20|    男|  03|     80|
|  02|    钱电|1990-12-21|    男|  01|     70|
|  02|    钱电|1990-12-21|    男|  02|     60|
|  02|    钱电|1990-12-21|    男|  03|     80|
|  06|    吴兰|1992-03-01|    女|  01|     31|
|  06|    吴兰|1992-03-01|    女|  03|     34|
|  04|    李云|1990-08-06|    男|  01|     50|
|  04|    李云|1990-08-06|    男|  02|     30|
|  04|    李云|1990-08-06|    男|  03|     20|
+----+------+----------+-----+----+-------+
*/

36、查询任何一门课程成绩在70分以上的学生姓名、课程名称和分数:

scoreTableDF
      .where("s_score > 70")
      .join(studentTableDF,"s_id")
      .join(courseTableDF,"c_id")
      .select("s_name","c_name","s_score")
      .show()

/**
+------+------+-------+
|s_name|c_name|s_score|
+------+------+-------+
|    赵雷|    语文|     80|
|    周梅|    语文|     76|
|    孙风|    语文|     80|
|    郑竹|    英语|     98|
|    赵雷|    英语|     99|
|    孙风|    英语|     80|
|    钱电|    英语|     80|
|    郑竹|    数学|     89|
|    赵雷|    数学|     90|
|    周梅|    数学|     87|
|    孙风|    数学|     80|
+------+------+-------+
*/

37、查询课程不及格的学生:

scoreTableDF
      .where("s_score < 60")
      .join(studentTableDF,"s_id")
      .show()

/**
+----+----+-------+------+----------+-----+
|s_id|c_id|s_score|s_name|   s_birth|s_sex|
+----+----+-------+------+----------+-----+
|  06|  01|     31|    吴兰|1992-03-01|    女|
|  06|  03|     34|    吴兰|1992-03-01|    女|
|  04|  01|     50|    李云|1990-08-06|    男|
|  04|  02|     30|    李云|1990-08-06|    男|
|  04|  03|     20|    李云|1990-08-06|    男|
+----+----+-------+------+----------+-----+
*/

38、查询课程编号为01且课程成绩在80分以上的学生的学号和姓名:

scoreTableDF
      .where("c_id = 01 and s_score >= 80")
      .join(studentTableDF,"s_id")
      .select("s_id","s_name")
      .show()

/**
+----+------+
|s_id|s_name|
+----+------+
|  01|    赵雷|
|  03|    孙风|
+----+------+
*/

39、求每门课程的学生人数:

scoreTableDF.groupBy("c_id").count().show()

/**
+----+-----+
|c_id|count|
+----+-----+
|  01|    6|
|  03|    6|
|  02|    6|
+----+-----+
*/

40、查询选修"张三"老师所授课程的学生中,成绩最高的学生信息及其成绩:

studentTableDF
      .join(scoreTableDF,"s_id")
      .join(courseTableDF,"c_id")
      .join(teacherTableDF.where("t_name = '张三'"),"t_id")
      .groupBy("c_id")
      .max("s_score")
      .show()

/**
+----+------------+
|c_id|max(s_score)|
+----+------------+
|  02|          90|
+----+------------+
*/

41、查询不同课程成绩相同的学生的学生编号、课程编号、学生成绩:

scoreTableDF.as("s1")
      .withColumnRenamed("c_id","s1_id")
      .withColumnRenamed("s_score","s1_score")
      .join(scoreTableDF.as("s2"),"s_id")
      .withColumnRenamed("c_id","s2_id")
      .withColumnRenamed("s_score","s2_score")
      .where("s1_score = s2_score and s1_id != s2_id")
      .select("s_id","s1_id","s2_id","s1_score")
      .withColumnRenamed("s1_score","score")
      .show()

/**
+----+-----+-----+-----+
|s_id|s1_id|s2_id|score|
+----+-----+-----+-----+
|  03|   01|   02|   80|
|  03|   01|   03|   80|
|  03|   02|   01|   80|
|  03|   02|   03|   80|
|  03|   03|   01|   80|
|  03|   03|   02|   80|
+----+-----+-----+-----+
*/

42、查询每门课程成绩最好的前三名:

scoreTableDF
      .selectExpr(
        "*","Row_Number() over(partition by c_id order by s_score desc) rank"
      ).where("rank between 1 and 3")
      .show()

/**
+----+----+-------+----+
|s_id|c_id|s_score|rank|
+----+----+-------+----+
|  01|  01|     80|   1|
|  03|  01|     80|   2|
|  05|  01|     76|   3|
|  01|  03|     99|   1|
|  07|  03|     98|   2|
|  02|  03|     80|   3|
|  01|  02|     90|   1|
|  07|  02|     89|   2|
|  05|  02|     87|   3|
+----+----+-------+----+
*/

43、统计每门课程的学生选修人数(超过5人的课程才统计)要求输出课程号和选修人数,查询结果按人数降序排列,若人数相同,按课程号升序排列

scoreTableDF
      .groupBy("c_id")
      .agg(count("s_id").as("count"))
      .where("count > 5")
      .orderBy($"count".desc,$"c_id")
      .select("c_id","count")
      .show()

/**
+----+-----+
|c_id|count|
+----+-----+
|  01|    6|
|  02|    6|
|  03|    6|
+----+-----+
*/

44、检索至少选修两门课程的学生学号:

scoreTableDF
      .groupBy("s_id")
      .agg(count("c_id").as("count"))
      .where("count >= 2")
      .select("s_id")
      .show()

/**
+----+
|s_id|
+----+
|  07|
|  01|
|  05|
|  03|
|  02|
|  06|
|  04|
+----+
*/

45、查询选修了全部课程的学生信息:

scoreTableDF
      .groupBy("s_id")
      .agg(count("c_id").as("count"))
      .where(s"count = ${courseTableDF.select("c_id").count()}")
      .join(studentTableDF,"s_id")
      .show()

/**
+----+-----+------+----------+-----+
|s_id|count|s_name|   s_birth|s_sex|
+----+-----+------+----------+-----+
|  01|    3|    赵雷|1990-01-01|    男|
|  03|    3|    孙风|1990-05-20|    男|
|  02|    3|    钱电|1990-12-21|    男|
|  04|    3|    李云|1990-08-06|    男|
+----+-----+------+----------+-----+
*/

46、查询各学生的年龄(周岁):

//-- 按照出生日期来算,当前月日 < 出生年月的月日则,年龄减一
    def getNowTime():String = {
      val time = new Date().getTime
      val format = new SimpleDateFormat("yyyy-MM-dd")
      format.format(time)
    }

    studentTableDF.map(
      x=>{
        val age = getNowTime().substring(0,4).toInt-x.get(2).toString.substring(0,4).toInt
        if (getNowTime().substring(5,7).toInt > x.get(2).toString.substring(5,7).toInt) (x.get(0).toString,age)
        else if (getNowTime().substring(5,7).toInt < x.get(2).toString.substring(5,7).toInt) (x.get(0).toString,age-1)
        else {
          if (getNowTime().substring(8,10).toInt >= x.get(2).toString.substring(8,10).toInt)(x.get(0).toString,age)
          else (x.get(0).toString,age-1)
        }
      }
    )
      .toDF("s_id","age")
      .join(studentTableDF,"s_id")
      .show()

/**
+----+---+------+----------+-----+
|s_id|age|s_name|   s_birth|s_sex|
+----+---+------+----------+-----+
|  07| 31|    郑竹|1989-07-01|    女|
|  01| 30|    赵雷|1990-01-01|    男|
|  05| 28|    周梅|1991-12-01|    女|
|  08| 30|    王菊|1990-01-20|    女|
|  03| 30|    孙风|1990-05-20|    男|
|  02| 29|    钱电|1990-12-21|    男|
|  06| 28|    吴兰|1992-03-01|    女|
|  04| 30|    李云|1990-08-06|    男|
+----+---+------+----------+-----+
*/

47、查询本周过生日的学生:找到下周一-1即为本周最后一天,开始时间为当前天(若今天就是星期天会不会出错?)

//  where内语句为一条语句,我这边为了更好展示所以进行了折行处理
    studentTableDF
      .where(
        "unix_timestamp(" +
          "cast(concat_ws('-',date_format(" +
          "current_date(),'yyyy'),date_format(s_birth,'MM')," +
          "date_format(s_birth,'dd')) as date),'yyyy-MM-dd') " +
          "between unix_timestamp(next_day(current_date(),'MON'),'yyyy-MM-dd') " +
          "and unix_timestamp(date_add(next_day(current_date(),'MON'),6),'yyyy-MM-dd')")
      .show()

/**
+----+------+----------+-----+
|s_id|s_name|   s_birth|s_sex|
+----+------+----------+-----+
|  05|    周梅|1991-12-01|    女|
+----+------+----------+-----+
*/

48、查询下周过生日的学生: 下周1到+6天

studentTableDF
      .where(
        "unix_timestamp(" +
          "cast(concat_ws('-',date_format(" +
          "current_date(),'yyyy'),date_format(s_birth,'MM')," +
          "date_format(s_birth,'dd')) as date),'yyyy-MM-dd') " +
          "between unix_timestamp(next_day(current_date(),'MON'),'yyyy-MM-dd')" +
          "and unix_timestamp(date_add(next_day(current_date(),'MON'),6),'yyyy-MM-dd')")
      .show()

/**
+----+------+----------+-----+
|s_id|s_name|   s_birth|s_sex|
+----+------+----------+-----+
|  05|    周梅|1991-12-01|    女|
+----+------+----------+-----+
*/

49、查询本月过生日的学生:

println("-49-")
    studentTableDF
      .where("month(s_birth) = month(current_date())")
      .show()

/**
+----+------+-------+-----+
|s_id|s_name|s_birth|s_sex|
+----+------+-------+-----+
+----+------+-------+-----+
*/

50、查询12月份过生日的学生:

println("-50-")
studentTableDF
  .where("month(s_birth) = 12")
  .show()

/**
+----+------+----------+-----+
|s_id|s_name|   s_birth|s_sex|
+----+------+----------+-----+
|  02|    钱电|1990-12-21|    男|
|  05|    周梅|1991-12-01|    女|
+----+------+----------+-----+
*/