题目:
1、 创建员工信息数据文件employess.txt,数据文件中包含内容如下(姓名、年龄、薪资、):
Lilith Hardy,30,6000,50,Finance Department
Byron Green,36,5000,25,Personnel Department
Yvette Ward,21,4500,15.5,
Arlen Esther,28,8000,20,Finance Department
Rupert Gold,39,10000,66,R&D Department
Deborah Madge,41,6500,0,R&D Department
Tim Springhall,22,6000,36.5,R&D Department
Olga Belloc,36,5600,10,Sales Department
Bruno Wallis,43,6700,0,Personnel Department
Flora Dan,27,4000,35,Sales Department
要求:查询员工信息表中员工年龄小于等于25岁或大于等于35岁的员工信息。
2、 创建学生数据文件students.txt,数据文件中包含内容如下(学号、姓名、专业、成绩):
1,student1, history,88
2,student2,geography,97
3,student3,chinese,65
4,student4,physics,63
5,student5, history,84
6,student6,psychology,72
7,student7, geography,69
8,student8, chinese,89
9,student9, history,86
10,student10, chinese,91
创建教师数据文件teacher.txt,数据文件中包含如下内容
history,teacher01
geography,teacher02
chinese,teacher03
physics,teacher04
psychology,teacher05
要求:
1) 对汉语专业同学按照课程成绩进行升序排序;
2) 使用左外连接,连接学生名单表与教师名单表;
3) 将历史专业的学生与教师合并在一起;
4) 计算地理专业学生的平均成绩
一.第一题
1.显示当前所有数据库
Show databases;
2.创建数据库study
CREATE DATABASE IF NOT EXISTS study
COMMENT "This is study database"
LOCATION '/user/hive_db/create_db/';
3.切换数据库
USE study;
4.创建数据表employess_table
CREATE EXTERNAL TABLE IF NOT EXISTS
study.employess_table(
name STRING COMMENT "This is name",
age INT COMMENT "This is age",
money INT COMMENT "This is money",
time FLOAT COMMENT "This is emm,I do not know",
department STRING COMMENT "This is department ")
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '_'
MAP KEYS TERMINATED BY ':'
LINES TERMINATED BY '\n
STORED AS textfile
LOCATION '/hive_data/class '
TBLPROPERTIES("comment"="This is a employess table");
5.导入数据
(1)创建数据文件employess
(2)准备数据文件
在HDFS创建目录/hive_data/class
hdfs dfs -mkdir /hive_data/class
将文件employess上传到HDFS的/hive_data/class目录
hdfs dfs -put /export/data/hive_data/employess /hive_data/class
(3)加载数据文件
(或者在这里创建数据表,指定/hive_data/class/为location)
向外部表employess_table加载文件employess
LOAD DATA INPATH '/hive_data/class/employess' OVERWRITE INTO TABLE study.employess_table;
6. 查询员工信息表中员工年龄小于等于25岁或大于等于35岁的员工信息
Select * from employess_table where age<=25 or age>=35;
二.第二题
1.创建数据文件
hdfs dfs -mkdir /hive_data/class
hdfs dfs -put /export/data/hive_data/student /hive_data/class
hdfs dfs -put /export/data/hive_data/teacher /hive_data/class
2.创建student_table和teacher_table
Student表手动导入数据(location填一个没有数据文件的地址,不能为空),Teacher表直接加载数据
(浏览器输入192.168.121.130及时查看HDFS文件系统的变化,便于理解)
导入原理参考:
CREATE EXTERNAL TABLE IF NOT EXISTS
study.student_table(
id INT COMMENT "This is id",
name STRING COMMENT "This is name",
subject STRING COMMENT "This is subject",
score INT COMMENT "This is score")
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '_'
MAP KEYS TERMINATED BY ':'
LINES TERMINATED BY '\n'
STORED AS textfile
LOCATION '/hive_data/hive'
TBLPROPERTIES("comment"="This is a student table");
LOAD DATA INPATH '/hive_data/class/student' OVERWRITE INTO TABLE study.student_table;
可以看到student文件已经不见了,转移到了表的location地址,hive文件夹下
CREATE EXTERNAL TABLE IF NOT EXISTS
study.teacher_table(
subject STRING COMMENT "This is subject",
name STRING COMMENT "This is name")
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '_'
MAP KEYS TERMINATED BY ':'
LINES TERMINATED BY '\n'
STORED AS textfile
LOCATION '/hive_data/class'
TBLPROPERTIES("comment"="This is a teacher table");
这里teacher表就直接将class文件夹下的所有文件载入表中,原来的文件还是在class中(location地址)
3.HiveQL操作语句
(1)对汉语专业同学按照课程成绩进行升序排序
这里发现导入的数据有空格..所以直接用=会匹配不上..,可以使用模糊匹配或者trim函数去除两边空格
Select * from student_table where subject like '%chinese%' order by score asc;
(2)使用左外连接,连接学生名单表与教师名单表
select * from student_table s left join teacher_table t on trim(s.subject)=trim(t.subject);
(3)将历史专业的学生与教师合并在一起
select * from student_table s left join teacher_table t on trim(s.subject)=trim(t.subject) where trim(s.subject)= 'history';
(4)计算地理专业学生的平均成绩
Select avg(score) from student_table where trim(subject)='geography';