文章目录
- 1 在Hive中执行shell命令和hdfs命令
- 执行shell命令
- 执行hdfs命令
- 2 DDL操作
- 1.Hive中的数据库
- 2.创建表
- 3.修改表
- 3 DML操作
- 1.加载数据
- 2.导出数据
- 3.查询数据
- 3.1 简单查询
- 3.2 复杂查询
登录Hadoop和Hive
[yao@master ~]$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
master: starting namenode, logging to /home/yao/hadoop-2.7.7/logs/hadoop-yao-namenode-master.out
slave: starting datanode, logging to /home/yao/hadoop-2.7.7/logs/hadoop-yao-datanode-slave.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/yao/hadoop-2.7.7/logs/hadoop-yao-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /home/yao/hadoop-2.7.7/logs/yarn-yao-resourcemanager-master.out
slave: starting nodemanager, logging to /home/yao/hadoop-2.7.7/logs/yarn-yao-nodemanager-slave.out
[yao@master ~]$ jps
19123 NameNode
19960 Jps
19532 ResourceManager
19359 SecondaryNameNode
[yao@master ~]$ hive
hive>
1 在Hive中执行shell命令和hdfs命令
执行shell命令
/home/yao
hive> ! pwd;
/home/yao
hive> ! echo "hello world!";
"hello world!"
hive> ! ls /home/yao/hadoopdata;
dfs
用户可以执行简单的bash shell命令,只要在命令前加上!并且以分号结尾。Hive CLI不能使用需要用户进行输入的交互式命令,而且不支持shell的“管道符”功能和文件名的自动补全功能。
执行hdfs命令
hive> dfs -ls /dou;
Found 2 items
-rw-r--r-- 1 yao supergroup 0 2019-03-19 17:20 /dou/_SUCCESS
-rw-r--r-- 1 yao supergroup 23 2019-03-19 17:20 /dou/part-r-00000
hive> dfs -rmr /out*;
rmr: DEPRECATED: Please use 'rm -r' instead.
Deleted /out
Deleted /out01
Deleted /out02
Deleted /out1
需要把hadoop关键字去掉,加上dfs,然后以分号结尾。(这种使用hadoop命令的方式实际上比与其等价的在bash shell中执行的hadoop fs命令更加高效)
2 DDL操作
1.Hive中的数据库
hive> show databases;
OK
default
Time taken: 0.801 seconds, Fetched: 1 row(s)
- 创建数据库
hive> create database yr;
OK
Time taken: 0.878 seconds
hive> create database yr_test;
OK
Time taken: 0.111 seconds
hive> create database yr_2019;
OK
Time taken: 0.265 seconds
hive> show databases;
OK
default
yr
yr_2019
yr_test
Time taken: 0.03 seconds, Fetched: 4 row(s)
如果数据库存在,则抛出异常
hive> create database if not exists yr;
OK
Time taken: 0.08 seconds
如果数据库很多,可以使用正则表达式,模糊匹配
hive> show databases like 'yr*';
OK
yr
yr_2019
yr_test
Time taken: 0.051 seconds, Fetched: 3 row(s)
- 查看数据库描述信息
hive> describe database yr;
OK
yr hdfs://master:9000/user/hive/warehouse/yr.db yao USER
Time taken: 0.104 seconds, Fetched: 1 row(s)
hive> dfs -ls /user/hive/warehouse;
Found 3 items
drwxr-xr-x - yao supergroup 0 2019-03-21 10:20 /user/hive/warehouse/yr.db
drwxr-xr-x - yao supergroup 0 2019-03-21 10:20 /user/hive/warehouse/yr_2019.db
drwxr-xr-x - yao supergroup 0 2019-03-21 10:20 /user/hive/warehouse/yr_test.db
- 删除数据库
hive> drop database yr_2019;
OK
Time taken: 0.627 seconds
hive> drop database yr_2019;
FAILED: SemanticException [Error 10072]: Database does not exist: yr_2019
hive> drop database if exists yr_2019;
OK
Time taken: 0.045 seconds
- 删除带有表的数据库
1)先删表再删库
2)直接强制删除
hive> drop database yr_test;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidOperationException(message:Database yr_test is not empty. One or more tables exist.)
hive> drop database yr_test cascade;
OK
Time taken: 2.873 seconds
2.创建表
- 建表示例
- 内部表和外部表
未被external修饰的是内部表(managed table),被external修饰的为外部表(external table); - 区别:
1)内部表的生命周期以及数据都由Hive自身管理,就是内部表的表结构和表中的数据都是由hive进行管理的。如果删除了内部表,那么内部表中的数据也会别删除。外部表只有表结构是hive进行管理的,数据由HDFS管理,如果删除hive中的外部表,那么表结构会删除,但是不会删除表中的数据。
2)删除内部表会直接删除元数据(metadata)及存储数据;删除外部表仅仅会删除元数据,HDFS上的文件并不会被删除;
3)对内部表的修改会将修改直接同步给元数据,而对外部表的表结构和分区进行修改,则需要修复(MSCK REPAIR TABLE table_name;)
4)内部表的数据存储的位置是hive.metastore.warehouse.dir(默认:/user/hive/warehouse),但是内部表和外部表都可以自己指定存储位置; - 创建内部表
hive> use yr;
OK
Time taken: 0.035 seconds
hive> create table student(
> id int,
> name string,
> age int)
> row format delimited //默认的hive的语句
> fields terminated by '\t' //字段与字段间用制表符分隔
> stored as textfile; //文件类型
OK
Time taken: 0.198 seconds
hive> desc student;
OK
id int
name string
age int
Time taken: 0.192 seconds, Fetched: 3 row(s)
hive> show create table student;
OK
CREATE TABLE `student`( //建表语句
`id` int,
`name` string,
`age` int)
ROW FORMAT DELIMITED //字段分隔符格式
FIELDS TERMINATED BY '\t'
STORED AS INPUTFORMAT //输入格式
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT //输出格式
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION //存放位置
'hdfs://master:9000/user/hive/warehouse/yr.db/student'
TBLPROPERTIES ( //时间戳
'transient_lastDdlTime'='1553136169')
Time taken: 0.256 seconds, Fetched: 14 row(s)
- 创建外部表
hive> create external table student2(
> id int,
> name string,
> age int)
> row format delimited
> fields terminated by '\t'
> stored as textfile
> location '/yr_1/student2'; //自己设定一个数据库存放的路径
OK
Time taken: 0.671 seconds
存放路径必须是hdfs上一个没有的路径,如果不设定就存在默认路径/user/hive/warehouse下
建表策略
如果数据是非常核心的,不能随便删除,最好建成外部表。即使删除了元数据,表结构也丢失了,它的数据依然还存在hdfs上。
- 分区表
静态分区表
hive> create table student3(
> id int,
> name string,
> age int)
> partitioned by(sex string) //通过性别分区
> row format delimited
> fields terminated by '\t'
> stored as textfile;
OK
Time taken: 0.276 seconds
- 分桶表
hive> create table student4(
> id int,
> name string,
> age int)
> clustered by(id) sorted by(id desc) into 3 buckets //将id降序排列,并分为三个桶
> row format delimited
> fields terminated by '\t';
OK
Time taken: 0.274 seconds
3.修改表
- 重命名表
hive> alter table student4 rename to student_buck;
OK
Time taken: 0.241 seconds
- 增加列
hive> alter table student add columns(address string);
OK
Time taken: 0.395 seconds
- 改变列
hive> alter table student change column id number int; //id是需要更改的列 number是列的新名字 int是新列数据类型
OK
Time taken: 0.2 seconds
hive> desc student;
OK
number int
name string
age int
address string
Time taken: 0.159 seconds, Fetched: 4 row(s)
- 替换列
hive> alter table student replace columns(id int,name string,age int);
OK
Time taken: 0.213 seconds
hive> desc student;
OK
id int
name string
age int
- 增加分区
hive> alter table student3
> add partition(sex='male')
> location '/user/hive/warehouse/ducl.db/student3/sex=male';
OK
Time taken: 0.909 seconds
必须在存在分区列的表上执行增加分区的操作,才会成功
- 删除分区
hive> dfs -ls /user/hive/warehouse/ducl.db/student3;
Found 1 items
drwxr-xr-x - yao supergroup 0 2019-03-21 11:47 /user/hive/warehouse/ducl.db/student3/sex=male
hive> alter table student3 drop partition(sex='male');
Dropped the partition sex=male
OK
Time taken: 0.484 seconds
hive> dfs -ls /user/hive/warehouse/ducl.db/student3;
hive>
增加分区后会在/user/hive/warehouse/ducl.db/student3/目录下生成一个sex=male目录,删除后这个目录便消失
3 DML操作
- 源数据
[yao@master ~]$ mkdir data
[yao@master ~]$ cd data/
[yao@master data]$ vi student1.txt
[yao@master data]$ vi student2.txt
student1.txt
1001 shiny 23
1002 cendy 22
1003 angel 23
1009 ella 21
1012 eva 24
student2.txt
1005 bob 24
1006 mark 23
1007 leo 22
1011 JACK 23
1014 JAMES 24
1.加载数据
- put
hive> dfs -put /home/yao/data/student1.txt /yr_1/student2;
hive> dfs -ls /yr_1/student2;
Found 1 items
-rw-r--r-- 1 yao supergroup 67 2019-03-21 11:58 /yr_1/student2/student1.txt
hive> select * from student2;
OK
1001 shiny 23
1002 cendy 22
1003 angel 23
1009 ella 21
1012 eva 24
Time taken: 1.612 seconds, Fetched: 5 row(s)
- load
1.从本地上传
hive> load data local inpath '/home/yao/data/student2.txt' into table student; //本地文件存在路径
Loading data to table yr.student
Table yr.student stats: [numFiles=1, numRows=0, totalSize=64, rawDataSize=0]
OK
Time taken: 0.665 seconds
hive> select * from student;
OK
1005 bob 24
1006 mark 23
1007 leo 22
1011 JACK 23
1014 JAMES 24
Time taken: 0.086 seconds, Fetched: 5 row(s)
2.从hdfs上传
hive> dfs -mkdir /data;
hive> dfs -put /home/yao/student1.txt /data; //数据所在位置 表结构存在的位置
hive> load data inpath '/data/student1.txt' into table student3 partition(sex='male'); //hdfs上文件存在路径
Loading data to table yr.student3 partition (sex=male)
Partition yr.student3{sex=male} stats: [numFiles=1, numRows=0, totalSize=67, rawDataSize=0]
OK
Time taken: 1.939 seconds
hive> select * from student3;
OK
1001 shiny 23 male
1002 cendy 22 male
1003 angel 23 male
1009 ella 21 male
1012 eva 24 male
Time taken: 0.984 seconds, Fetched: 5 row(s)
- 通过查询语句向表中插入数据
hive> insert into table student_buck select * from student;
Query ID = yao_20190321141401_e646741e-547f-4295-90f8-1aead7e01fe1
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1553134178907_0001, Tracking URL = http://master:18088/proxy/application_1553134178907_0001/
Kill Command = /home/yao/hadoop-2.7.7/bin/hadoop job -kill job_1553134178907_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-03-21 14:14:25,729 Stage-1 map = 0%, reduce = 0%
2019-03-21 14:14:37,827 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.35 sec
MapReduce Total cumulative CPU time: 3 seconds 350 msec
Ended Job = job_1553134178907_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://master:9000/user/hive/warehouse/yr.db/student_buck/.hive-staging_hive_2019-03-21_14-14-01_064_5299820726034498907-1/-ext-10000
Loading data to table yr.student_buck
Table yr.student_buck stats: [numFiles=1, numRows=5, totalSize=64, rawDataSize=59]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 3.35 sec HDFS Read: 3535 HDFS Write: 135 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 350 msec
OK
Time taken: 40.045 seconds
hive> select * from student_buck;
OK
1005 bob 24
1006 mark 23
1007 leo 22
1011 JACK 23
1014 JAMES 24
Time taken: 0.143 seconds, Fetched: 5 row(s)
- 单个查询语句向表中插入数据
hive> create table student5 as select * from student2;
Query ID = yao_20190321142009_e6168e54-1da2-4cbb-ac22-c9764f9afb65
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1553134178907_0002, Tracking URL = http://master:18088/proxy/application_1553134178907_0002/
Kill Command = /home/yao/hadoop-2.7.7/bin/hadoop job -kill job_1553134178907_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-03-21 14:20:27,609 Stage-1 map = 0%, reduce = 0%
2019-03-21 14:20:41,053 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.61 sec
MapReduce Total cumulative CPU time: 4 seconds 610 msec
Ended Job = job_1553134178907_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://master:9000/user/hive/warehouse/yr.db/.hive-staging_hive_2019-03-21_14-20-09_630_2868054019913652632-1/-ext-10001
Moving data to: hdfs://master:9000/user/hive/warehouse/yr.db/student5
Table yr.student5 stats: [numFiles=1, numRows=5, totalSize=67, rawDataSize=62]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 4.61 sec HDFS Read: 2881 HDFS Write: 134 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 610 msec
OK
Time taken: 34.194 seconds
hive> select * from student5;
OK
1001 shiny 23
1002 cendy 22
1003 angel 23
1009 ella 21
1012 eva 24
Time taken: 0.091 seconds, Fetched: 5 row(s)
在创建表的同时去加载数据
2.导出数据
- 导出数据到本地
[yao@master data]$ ls
student1.txt student2.txt
hive> insert overwrite local directory '/home/yao/data/student3' select * from student3;
Query ID = yao_20190321142609_afcf0a6a-1796-426f-b621-6a7af73acb0a
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1553134178907_0003, Tracking URL = http://master:18088/proxy/application_1553134178907_0003/
Kill Command = /home/yao/hadoop-2.7.7/bin/hadoop job -kill job_1553134178907_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-03-21 14:26:24,776 Stage-1 map = 0%, reduce = 0%
2019-03-21 14:26:34,428 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.86 sec
MapReduce Total cumulative CPU time: 2 seconds 860 msec
Ended Job = job_1553134178907_0003
Copying data to local directory /home/yao/data/student3
Copying data to local directory /home/yao/data/student3
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 2.86 sec HDFS Read: 3409 HDFS Write: 92 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 860 msec
OK
Time taken: 27.62 seconds
[yao@master data]$ ls
student1.txt student2.txt student3
[yao@master data]$ cd student3/
[yao@master student3]$ ls
000000_0
[yao@master student3]$ cat 000000_0
1001shiny23male
1002cendy22male
1003angel23male
1009ella21male
1012eva24male
带分隔符的导出
hive> insert overwrite local directory '/home/yao/data/student' row format delimited fields terminated by ',' select * from student;
Query ID = yao_20190321143106_4f9ce2f6-d1f8-482b-aedf-dae5f55e6b84
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1553134178907_0004, Tracking URL = http://master:18088/proxy/application_1553134178907_0004/
Kill Command = /home/yao/hadoop-2.7.7/bin/hadoop job -kill job_1553134178907_0004
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-03-21 14:31:20,532 Stage-1 map = 0%, reduce = 0%
2019-03-21 14:31:29,208 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.98 sec
MapReduce Total cumulative CPU time: 2 seconds 980 msec
Ended Job = job_1553134178907_0004
Copying data to local directory /home/yao/data/student
Copying data to local directory /home/yao/data/student
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 2.98 sec HDFS Read: 3199 HDFS Write: 64 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 980 msec
OK
Time taken: 24.758 seconds
[yao@master data]$ ls
student student1.txt student2.txt student3
[yao@master data]$ cd student
[yao@master student]$ ls
000000_0
[yao@master student]$ cat 000000_0
1005,bob,24
1006,mark,23
1007,leo,22
1011,JACK,23
1014,JAMES,24
- 导出数据到HDFS
hive> insert overwrite directory '/data/student' row format delimited fields terminated by '\t' select * from student;
Query ID = yao_20190321143628_5dda4ab4-9854-4fff-9174-fab6d05899ef
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1553134178907_0005, Tracking URL = http://master:18088/proxy/application_1553134178907_0005/
Kill Command = /home/yao/hadoop-2.7.7/bin/hadoop job -kill job_1553134178907_0005
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-03-21 14:36:44,677 Stage-1 map = 0%, reduce = 0%
2019-03-21 14:36:54,423 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.95 sec
MapReduce Total cumulative CPU time: 2 seconds 950 msec
Ended Job = job_1553134178907_0005
Stage-3 is selected by condition resolver.
Stage-2 is filtered out by condition resolver.
Stage-4 is filtered out by condition resolver.
Moving data to: hdfs://master:9000/data/student/.hive-staging_hive_2019-03-21_14-36-28_119_7183395134216927212-1/-ext-10000
Moving data to: /data/student
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 2.95 sec HDFS Read: 3155 HDFS Write: 64 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 950 msec
OK
Time taken: 28.785 seconds
hive> dfs -ls /data/student;
Found 1 items
-rwxr-xr-x 1 yao supergroup 64 2019-03-21 14:36 /data/student/000000_0
hive> dfs -cat /data/student/000000_0;
1005 bob 24
1006 mark 23
1007 leo 22
1011 JACK 23
1014 JAMES 24
3.查询数据
3.1 简单查询
- 数据
/home/yao/data/stu_messages.txt
005 男 16 北京 13754554587
006 女 17 北京 13872374170
001 男 15 北京 13903030485
002 女 16 北京 17938844884
023 男 15 北京 13700000033
004 男 17 北京 15257575773
003 女 15 北京 15885888889
010 男 16 北京 14098593989
015 女 14 北京 14938983000
/home/yao/data/stu_scores.txt
001 赵四 89 46 90
002 张晓梅 95 89 95
003 小明 89 98 88
004 小舞 89 69 84
005 吴帅帅 94 90 95
006 王晓晓 89 43 89
007 孙强 99 98 97
008 小英 69 85 89
009 王五 89 59 98
- 创建表并加载数据
hive> create table stu_message(
> id int,
> sex string,
> age int,
> location string,
> tel_num string)
> row format delimited fields terminated by '\t'
> stored as textfile;
OK
Time taken: 0.074 seconds
hive> create table stu_scores(
> id int,
> name string,
> math int,
> chinese int,
> eng int)
> row format delimited fields terminated by '\t'
> stored as textfile;
OK
Time taken: 0.107 seconds
hive> load data local inpath '/home/yao/data/stu_messages.txt' overwrite into table stu_message;
hive> load data local inpath '/home/yao/data/stu_scores.txt' overwrite into table stu_scores;
1.某字段的查询
//查询出学生成绩表中学生姓名及语文成绩
hive> select name,chinese from stu_scores;
OK
赵四 46
张晓梅 89
小明 98
小舞 69
吴帅帅 90
王晓晓 43
孙强 98
小英 85
王五 59
Time taken: 0.495 seconds, Fetched: 9 row(s)
//查询出学生信息表中学生的年龄及手机号
hive> select age,tel_num from stu_message;
OK
16 北京
17 北京
15 北京
16 北京
15 北京
17 北京
15 北京
16 北京
14 北京
Time taken: 0.133 seconds, Fetched: 9 row(s)
2.运算查询
//计算出每个学生的总成绩
hive> select name,chinese+math+eng from stu_scores;
OK
赵四 225
张晓梅 279
小明 275
小舞 242
吴帅帅 279
王晓晓 221
孙强 294
小英 243
王五 246
Time taken: 0.222 seconds, Fetched: 9 row(s)
//计算出各科目的平均分
hive> select avg(chinese),avg(math),avg(eng) from stu_scores;
//result
Total MapReduce CPU Time Spent: 8 seconds 60 msec
OK
75.22222222222223 89.11111111111111 91.66666666666667
//查询出各科的最高分
hive> select max(chinese),max(math),max(eng) from stu_scores;
//result
Total MapReduce CPU Time Spent: 5 seconds 320 msec
OK
98 99 98
3.过滤条件
- 判断查询
格式:if( , , )
第一个逗号里面的内容写的是判断的条件,第二个逗号里面的内容写的是满足这个判断之后执行的内容,第三个逗号里面的内容写的是不满足这个判断之后执行的内容,可以是指定的某个值或者是一个语句
//查询出年龄是15岁的女孩的学号
hive> select if(age=15,id,0) from stu_message;
OK
0
0
1
0
23 //age是15的返回id,不符合条件的返回0
0
3
0
0
Time taken: 0.112 seconds, Fetched: 9 row(s)
- 过滤条件
//查询出学生成绩表中语文、数学,英语成绩均大于80的学生姓名
hive> select name from stu_scores
> where chinese>80
> and math >80
> and eng>80;
OK
张晓梅
小明
吴帅帅
孙强
Time taken: 0.178 seconds, Fetched: 4 row(s)
//查询出学生信息表中学生的年龄大于15的学生性别
hive> select sex from stu_message where age>15;
OK
男
女
女
男
男
Time taken: 0.076 seconds, Fetched: 5 row(s)
//查询出姓名中带有小字的学生姓名
hive> select name from stu_scores where name like '%小%';
OK
小明
小舞
小英
Time taken: 0.099 seconds, Fetched: 3 row(s)
4.分组
//计算出学生的男女的总数
hive> select sex,count(*) from stu_message group by sex;
//result
Total MapReduce CPU Time Spent: 4 seconds 670 msec
OK
女 4
男 5
//计算出学生中年龄在15岁以上的男女同学的总数
select sex,count(*) from stu_message where age>15 group by sex;
5.排序
order by
distribute by
sort by
cluster by
//按学生姓名,查询出学生总成绩按倒序排列
select name,(chinese+math+eng)as cnt from stu_scores order by cnt desc;
//result
Total MapReduce CPU Time Spent: 6 seconds 110 msec
OK
孙强 294
吴帅帅 279
张晓梅 279
小明 275
王五 246
小英 243
小舞 242
赵四 225
王晓晓 221
Time taken: 30.053 seconds, Fetched: 9 row(s)
3.2 复杂查询
1.JOIN操作
- 内连接:join on
//通过内连接,把学生成绩表和学生信息表关联起来,得到学生的姓名
select a.name,a.eng from stu_scores a join stu_message b on a.id=b.id;
//result
Total MapReduce CPU Time Spent: 3 seconds 620 msec
OK
吴帅帅 95
王晓晓 89
赵四 90
张晓梅 95
小舞 84
小明 88
- 外连接:左外连接 右外连接 全外连接
//通过外连接,把学生成绩表和学生信息表关联起来,得到学生的姓名和性别
select a.name,b.sex from stu_scores a left join stu_message b on a.id=b.id;
//result
Total MapReduce CPU Time Spent: 2 seconds 110 msec
OK
赵四 男
张晓梅 女
小明 女
小舞 男
吴帅帅 男
王晓晓 女
孙强 NULL
小英 NULL
王五 NULL
- 左半连接:
通过左半连接,把学生成绩表和学生信息表关联起来,得到学生的姓名
select a.name, id from stu_scores a left semi join stu_message b on a.id=b.id;
//result
Total MapReduce CPU Time Spent: 3 seconds 270 msec
OK
赵四 1
张晓梅 2
小明 3
小舞 4
吴帅帅 5
王晓晓 6
2.UNION ALL
表结构相同
可以在每个连接表的后面加上过滤条件(可以不一样)
可以加过滤条件
select name,'chinese',chinese from stu_scores where chinese in (select max(chinese) from stu_scores)
union all
select name,'math',math from stu_scores where math in (select max(math) from stu_scores)
union all
select name,'eng',eng from stu_scores where eng in (select max(eng) from stu_scores);
//result
Total MapReduce CPU Time Spent: 37 seconds 570 msec
OK
小明 chinese 98
孙强 chinese 98
孙强 math 99
王五 eng 98
3.嵌套查询
格式:
select * from (select * from (select * from tablename)b )a;
如何写?
复杂化的东西简单化
先找出最里层的查询,最后在进行组合