题目:

1、 创建员工信息数据文件employess.txt,数据文件中包含内容如下(姓名、年龄、薪资、):

Lilith Hardy,30,6000,50,Finance Department

Byron Green,36,5000,25,Personnel Department

Yvette Ward,21,4500,15.5,

Arlen Esther,28,8000,20,Finance Department

Rupert Gold,39,10000,66,R&D Department

Deborah Madge,41,6500,0,R&D Department

Tim Springhall,22,6000,36.5,R&D Department

Olga Belloc,36,5600,10,Sales Department

Bruno Wallis,43,6700,0,Personnel Department

Flora Dan,27,4000,35,Sales Department

要求:查询员工信息表中员工年龄小于等于25岁或大于等于35岁的员工信息。

2、 创建学生数据文件students.txt,数据文件中包含内容如下(学号、姓名、专业、成绩):

1,student1, history,88

2,student2,geography,97

3,student3,chinese,65

4,student4,physics,63

5,student5, history,84

6,student6,psychology,72

7,student7, geography,69

8,student8, chinese,89

9,student9, history,86

10,student10, chinese,91

创建教师数据文件teacher.txt,数据文件中包含如下内容

history,teacher01

geography,teacher02

chinese,teacher03

physics,teacher04

psychology,teacher05

要求:

1) 对汉语专业同学按照课程成绩进行升序排序;

2) 使用左外连接,连接学生名单表与教师名单表;

3) 将历史专业的学生与教师合并在一起;

4) 计算地理专业学生的平均成绩

一.第一题

1.显示当前所有数据库



Show databases;






hive计算车龄 hive sql 计算年龄_hive



2.创建数据库study



CREATE DATABASE IF NOT EXISTS study 
COMMENT "This is study database"
LOCATION '/user/hive_db/create_db/';






hive计算车龄 hive sql 计算年龄_数据库_02



3.切换数据库



USE study;






hive计算车龄 hive sql 计算年龄_hive计算车龄_03



4.创建数据表employess_table



CREATE EXTERNAL TABLE IF NOT EXISTS 
study.employess_table(
name STRING COMMENT "This is name",
age INT COMMENT "This is age",
money INT COMMENT "This is money",
time FLOAT COMMENT "This is emm,I do not know",
department STRING COMMENT "This is department ")
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '_'
MAP KEYS TERMINATED BY ':'
LINES TERMINATED BY '\n
STORED AS textfile
LOCATION '/hive_data/class '
TBLPROPERTIES("comment"="This is a employess table");






hive计算车龄 hive sql 计算年龄_hive计算车龄_04



5.导入数据

(1)创建数据文件employess




hive计算车龄 hive sql 计算年龄_数据文件_05



(2)准备数据文件

在HDFS创建目录/hive_data/class



hdfs dfs -mkdir /hive_data/class



将文件employess上传到HDFS的/hive_data/class目录



hdfs dfs -put /export/data/hive_data/employess /hive_data/class



(3)加载数据文件

(或者在这里创建数据表,指定/hive_data/class/为location)

向外部表employess_table加载文件employess



LOAD DATA INPATH '/hive_data/class/employess' OVERWRITE INTO TABLE study.employess_table;






hive计算车龄 hive sql 计算年龄_数据库_06



6. 查询员工信息表中员工年龄小于等于25岁或大于等于35岁的员工信息



Select * from employess_table where age<=25 or age>=35;






hive计算车龄 hive sql 计算年龄_hive_07



二.第二题

1.创建数据文件




hive计算车龄 hive sql 计算年龄_hdfs_08






hive计算车龄 hive sql 计算年龄_数据文件_09




hdfs dfs -mkdir /hive_data/class
hdfs dfs -put /export/data/hive_data/student /hive_data/class
hdfs dfs -put /export/data/hive_data/teacher /hive_data/class






hive计算车龄 hive sql 计算年龄_数据文件_10



2.创建student_table和teacher_table

Student表手动导入数据(location填一个没有数据文件的地址,不能为空),Teacher表直接加载数据

(浏览器输入192.168.121.130及时查看HDFS文件系统的变化,便于理解)

导入原理参考:





CREATE EXTERNAL TABLE IF NOT EXISTS 
study.student_table(
id INT COMMENT "This is id",
name STRING COMMENT "This is name",
subject STRING COMMENT "This is subject",
score INT COMMENT "This is score")
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '_'
MAP KEYS TERMINATED BY ':'
LINES TERMINATED BY '\n'
STORED AS textfile
LOCATION '/hive_data/hive'
TBLPROPERTIES("comment"="This is a student table");
LOAD DATA INPATH '/hive_data/class/student' OVERWRITE INTO TABLE study.student_table;






hive计算车龄 hive sql 计算年龄_hive_11



可以看到student文件已经不见了,转移到了表的location地址,hive文件夹下




hive计算车龄 hive sql 计算年龄_hive计算车龄_12






hive计算车龄 hive sql 计算年龄_数据库_13




CREATE EXTERNAL TABLE IF NOT EXISTS 
study.teacher_table(
subject STRING COMMENT "This is subject",
name STRING COMMENT "This is name")
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '_'
MAP KEYS TERMINATED BY ':'
LINES TERMINATED BY '\n'
STORED AS textfile
LOCATION '/hive_data/class'
TBLPROPERTIES("comment"="This is a teacher table");



这里teacher表就直接将class文件夹下的所有文件载入表中,原来的文件还是在class中(location地址)




hive计算车龄 hive sql 计算年龄_hive计算车龄_14



3.HiveQL操作语句

(1)对汉语专业同学按照课程成绩进行升序排序

这里发现导入的数据有空格..所以直接用=会匹配不上..,可以使用模糊匹配或者trim函数去除两边空格



Select * from student_table where subject like '%chinese%' order by score asc;






hive计算车龄 hive sql 计算年龄_hdfs_15



(2)使用左外连接,连接学生名单表与教师名单表



select * from student_table s left join teacher_table t on trim(s.subject)=trim(t.subject);






hive计算车龄 hive sql 计算年龄_数据文件_16



(3)将历史专业的学生与教师合并在一起



select * from student_table s left join teacher_table t on trim(s.subject)=trim(t.subject) where trim(s.subject)= 'history';






hive计算车龄 hive sql 计算年龄_hdfs_17



(4)计算地理专业学生的平均成绩



Select avg(score) from student_table where trim(subject)='geography';






hive计算车龄 hive sql 计算年龄_hive计算车龄_18