hive sql 手机号脱敏 hive 匹配手机号

转载

mob64ca140d2323 2024-06-17 12:23:50

文章标签 hive sql 手机号脱敏 hadoop hive Time 文章分类 Hive 大数据

文章目录

1 在Hive中执行shell命令和hdfs命令

执行shell命令
执行hdfs命令

2 DDL操作

1.Hive中的数据库
2.创建表
3.修改表

3 DML操作

1.加载数据
2.导出数据
3.查询数据

3.1 简单查询
3.2 复杂查询

登录Hadoop和Hive

[yao@master ~]$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
master: starting namenode, logging to /home/yao/hadoop-2.7.7/logs/hadoop-yao-namenode-master.out
slave: starting datanode, logging to /home/yao/hadoop-2.7.7/logs/hadoop-yao-datanode-slave.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/yao/hadoop-2.7.7/logs/hadoop-yao-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /home/yao/hadoop-2.7.7/logs/yarn-yao-resourcemanager-master.out
slave: starting nodemanager, logging to /home/yao/hadoop-2.7.7/logs/yarn-yao-nodemanager-slave.out
[yao@master ~]$ jps
19123 NameNode
19960 Jps
19532 ResourceManager
19359 SecondaryNameNode
[yao@master ~]$ hive
hive>

1 在Hive中执行shell命令和hdfs命令

执行shell命令

/home/yao
hive> ! pwd;
/home/yao
hive> ! echo "hello world!";
"hello world!"
hive> ! ls /home/yao/hadoopdata;
dfs

用户可以执行简单的bash shell命令，只要在命令前加上！并且以分号结尾。Hive CLI不能使用需要用户进行输入的交互式命令，而且不支持shell的“管道符”功能和文件名的自动补全功能。

执行hdfs命令

hive> dfs -ls /dou;
Found 2 items
-rw-r--r--   1 yao supergroup          0 2019-03-19 17:20 /dou/_SUCCESS
-rw-r--r--   1 yao supergroup         23 2019-03-19 17:20 /dou/part-r-00000
hive> dfs -rmr /out*;
rmr: DEPRECATED: Please use 'rm -r' instead.
Deleted /out
Deleted /out01
Deleted /out02
Deleted /out1

需要把hadoop关键字去掉，加上dfs，然后以分号结尾。（这种使用hadoop命令的方式实际上比与其等价的在bash shell中执行的hadoop fs命令更加高效）

2 DDL操作

1.Hive中的数据库

hive> show databases;
OK
default
Time taken: 0.801 seconds, Fetched: 1 row(s)

创建数据库

hive> create database yr;
OK
Time taken: 0.878 seconds
hive> create database yr_test;
OK
Time taken: 0.111 seconds
hive> create database yr_2019;
OK
Time taken: 0.265 seconds
hive> show databases;
OK
default
yr
yr_2019
yr_test
Time taken: 0.03 seconds, Fetched: 4 row(s)

如果数据库存在，则抛出异常

hive> create database if not exists yr;
OK
Time taken: 0.08 seconds

如果数据库很多，可以使用正则表达式，模糊匹配

hive> show databases like 'yr*';
OK
yr
yr_2019
yr_test
Time taken: 0.051 seconds, Fetched: 3 row(s)

查看数据库描述信息

hive> describe database yr;
OK
yr		hdfs://master:9000/user/hive/warehouse/yr.db	yao	USER	
Time taken: 0.104 seconds, Fetched: 1 row(s)
hive> dfs -ls /user/hive/warehouse;
Found 3 items
drwxr-xr-x   - yao supergroup          0 2019-03-21 10:20 /user/hive/warehouse/yr.db
drwxr-xr-x   - yao supergroup          0 2019-03-21 10:20 /user/hive/warehouse/yr_2019.db
drwxr-xr-x   - yao supergroup          0 2019-03-21 10:20 /user/hive/warehouse/yr_test.db

删除数据库

hive> drop database yr_2019;
OK
Time taken: 0.627 seconds
hive> drop database yr_2019;
FAILED: SemanticException [Error 10072]: Database does not exist: yr_2019
hive> drop database if exists yr_2019;
OK
Time taken: 0.045 seconds

删除带有表的数据库

1）先删表再删库

2）直接强制删除

hive> drop database yr_test;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidOperationException(message:Database yr_test is not empty. One or more tables exist.)
hive> drop database yr_test cascade;
OK
Time taken: 2.873 seconds

2.创建表

建表示例
内部表和外部表
未被external修饰的是内部表（managed table），被external修饰的为外部表（external table）；
区别：
1）内部表的生命周期以及数据都由Hive自身管理，就是内部表的表结构和表中的数据都是由hive进行管理的。如果删除了内部表，那么内部表中的数据也会别删除。外部表只有表结构是hive进行管理的，数据由HDFS管理，如果删除hive中的外部表，那么表结构会删除，但是不会删除表中的数据。
2）删除内部表会直接删除元数据（metadata）及存储数据；删除外部表仅仅会删除元数据，HDFS上的文件并不会被删除；
3）对内部表的修改会将修改直接同步给元数据，而对外部表的表结构和分区进行修改，则需要修复（MSCK REPAIR TABLE table_name;）
4）内部表的数据存储的位置是hive.metastore.warehouse.dir（默认：/user/hive/warehouse），但是内部表和外部表都可以自己指定存储位置；
创建内部表

hive> use yr;
OK
Time taken: 0.035 seconds
hive> create table student(
    > id int,
    > name string,
    > age int)
    > row format delimited			//默认的hive的语句
    > fields terminated by '\t'			//字段与字段间用制表符分隔
    > stored as textfile;					//文件类型
OK
Time taken: 0.198 seconds
hive> desc student;
OK
id                  	int                 	                    
name                	string              	                    
age                 	int                 	                    
Time taken: 0.192 seconds, Fetched: 3 row(s)
hive> show create table student;
OK
CREATE TABLE `student`(					//建表语句
  `id` int, 
  `name` string, 
  `age` int)
ROW FORMAT DELIMITED 				//字段分隔符格式
  FIELDS TERMINATED BY '\t' 
STORED AS INPUTFORMAT 			//输入格式
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 								//输出格式
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION											//存放位置
  'hdfs://master:9000/user/hive/warehouse/yr.db/student'
TBLPROPERTIES (							//时间戳
  'transient_lastDdlTime'='1553136169')
Time taken: 0.256 seconds, Fetched: 14 row(s)

创建外部表

hive> create external table student2(
    > id int,
    > name string,
    > age int)
    > row format delimited
    > fields terminated by '\t'
    > stored as textfile
    > location '/yr_1/student2';				//自己设定一个数据库存放的路径
OK
Time taken: 0.671 seconds

存放路径必须是hdfs上一个没有的路径，如果不设定就存在默认路径/user/hive/warehouse下

建表策略
如果数据是非常核心的，不能随便删除，最好建成外部表。即使删除了元数据，表结构也丢失了，它的数据依然还存在hdfs上。

分区表

静态分区表

hive> create table student3(
    > id int,
    > name string,
    > age int)
    > partitioned by(sex string)					//通过性别分区
    > row format delimited
    > fields terminated by '\t'
    > stored as textfile;
OK
Time taken: 0.276 seconds

分桶表

hive> create table student4(
    > id int,
    > name string,
    > age int)
    > clustered by(id) sorted by(id desc) into 3 buckets			//将id降序排列，并分为三个桶
    > row format delimited 
    > fields terminated by '\t';
OK
Time taken: 0.274 seconds

3.修改表

重命名表

hive> alter table student4 rename to student_buck;
OK
Time taken: 0.241 seconds

增加列

hive> alter table student add columns(address string);
OK
Time taken: 0.395 seconds

改变列

hive> alter table student change column id number int;			//id是需要更改的列 number是列的新名字 int是新列数据类型
OK
Time taken: 0.2 seconds
hive> desc student;
OK
number              	int                 	                    
name                	string              	                    
age                 	int                 	                    
address             	string              	                    
Time taken: 0.159 seconds, Fetched: 4 row(s)

替换列

hive> alter table student replace columns(id int,name string,age int);
OK
Time taken: 0.213 seconds
hive> desc student;
OK
id                  	int                 	                    
name                	string              	                    
age                 	int

增加分区

hive> alter table student3 
    > add partition(sex='male')
    > location '/user/hive/warehouse/ducl.db/student3/sex=male';
OK
Time taken: 0.909 seconds

必须在存在分区列的表上执行增加分区的操作，才会成功

删除分区

hive> dfs -ls /user/hive/warehouse/ducl.db/student3;
Found 1 items
drwxr-xr-x   - yao supergroup          0 2019-03-21 11:47 /user/hive/warehouse/ducl.db/student3/sex=male
hive> alter table student3 drop partition(sex='male');
Dropped the partition sex=male
OK
Time taken: 0.484 seconds
hive> dfs -ls /user/hive/warehouse/ducl.db/student3;
hive>

增加分区后会在/user/hive/warehouse/ducl.db/student3/目录下生成一个sex=male目录，删除后这个目录便消失

3 DML操作

源数据

[yao@master ~]$ mkdir data
[yao@master ~]$ cd data/
[yao@master data]$ vi student1.txt
[yao@master data]$ vi student2.txt

student1.txt
1001	shiny	23
1002	cendy	22
1003	angel	23
1009	ella	21
1012	eva	24

student2.txt
1005	bob	24
1006	mark	23
1007	leo	22
1011	JACK	23
1014	JAMES	24

1.加载数据

hive> dfs -put /home/yao/data/student1.txt /yr_1/student2;
hive> dfs -ls /yr_1/student2;
Found 1 items
-rw-r--r--   1 yao supergroup         67 2019-03-21 11:58 /yr_1/student2/student1.txt
hive> select * from student2;
OK
1001	shiny	23
1002	cendy	22
1003	angel	23
1009	ella	21
1012	eva	24
Time taken: 1.612 seconds, Fetched: 5 row(s)

load

1.从本地上传

hive> load data local inpath '/home/yao/data/student2.txt' into table student;				//本地文件存在路径
Loading data to table yr.student
Table yr.student stats: [numFiles=1, numRows=0, totalSize=64, rawDataSize=0]
OK
Time taken: 0.665 seconds
hive> select * from student;
OK
1005	bob	24
1006	mark	23
1007	leo	22
1011	JACK	23
1014	JAMES	24
Time taken: 0.086 seconds, Fetched: 5 row(s)

2.从hdfs上传

hive> dfs -mkdir /data;
hive> dfs -put /home/yao/student1.txt /data;				//数据所在位置 表结构存在的位置
hive> load data inpath '/data/student1.txt' into table student3 partition(sex='male');				//hdfs上文件存在路径
Loading data to table yr.student3 partition (sex=male)
Partition yr.student3{sex=male} stats: [numFiles=1, numRows=0, totalSize=67, rawDataSize=0]
OK
Time taken: 1.939 seconds
hive> select * from student3;
OK
1001	shiny	23	male
1002	cendy	22	male
1003	angel	23	male
1009	ella	21	male
1012	eva	24	male
Time taken: 0.984 seconds, Fetched: 5 row(s)

通过查询语句向表中插入数据

hive> insert into table student_buck select * from student;
Query ID = yao_20190321141401_e646741e-547f-4295-90f8-1aead7e01fe1
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1553134178907_0001, Tracking URL = http://master:18088/proxy/application_1553134178907_0001/
Kill Command = /home/yao/hadoop-2.7.7/bin/hadoop job  -kill job_1553134178907_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-03-21 14:14:25,729 Stage-1 map = 0%,  reduce = 0%
2019-03-21 14:14:37,827 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.35 sec
MapReduce Total cumulative CPU time: 3 seconds 350 msec
Ended Job = job_1553134178907_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://master:9000/user/hive/warehouse/yr.db/student_buck/.hive-staging_hive_2019-03-21_14-14-01_064_5299820726034498907-1/-ext-10000
Loading data to table yr.student_buck
Table yr.student_buck stats: [numFiles=1, numRows=5, totalSize=64, rawDataSize=59]
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 3.35 sec   HDFS Read: 3535 HDFS Write: 135 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 350 msec
OK
Time taken: 40.045 seconds
hive> select * from student_buck;
OK
1005	bob	24
1006	mark	23
1007	leo	22
1011	JACK	23
1014	JAMES	24
Time taken: 0.143 seconds, Fetched: 5 row(s)

单个查询语句向表中插入数据

hive> create table student5 as select * from student2;
Query ID = yao_20190321142009_e6168e54-1da2-4cbb-ac22-c9764f9afb65
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1553134178907_0002, Tracking URL = http://master:18088/proxy/application_1553134178907_0002/
Kill Command = /home/yao/hadoop-2.7.7/bin/hadoop job  -kill job_1553134178907_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-03-21 14:20:27,609 Stage-1 map = 0%,  reduce = 0%
2019-03-21 14:20:41,053 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.61 sec
MapReduce Total cumulative CPU time: 4 seconds 610 msec
Ended Job = job_1553134178907_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://master:9000/user/hive/warehouse/yr.db/.hive-staging_hive_2019-03-21_14-20-09_630_2868054019913652632-1/-ext-10001
Moving data to: hdfs://master:9000/user/hive/warehouse/yr.db/student5
Table yr.student5 stats: [numFiles=1, numRows=5, totalSize=67, rawDataSize=62]
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 4.61 sec   HDFS Read: 2881 HDFS Write: 134 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 610 msec
OK
Time taken: 34.194 seconds
hive> select * from student5;
OK
1001	shiny	23
1002	cendy	22
1003	angel	23
1009	ella	21
1012	eva	24
Time taken: 0.091 seconds, Fetched: 5 row(s)

在创建表的同时去加载数据

2.导出数据

导出数据到本地

[yao@master data]$ ls
student1.txt  student2.txt

hive> insert overwrite local directory '/home/yao/data/student3' select * from student3;
Query ID = yao_20190321142609_afcf0a6a-1796-426f-b621-6a7af73acb0a
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1553134178907_0003, Tracking URL = http://master:18088/proxy/application_1553134178907_0003/
Kill Command = /home/yao/hadoop-2.7.7/bin/hadoop job  -kill job_1553134178907_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-03-21 14:26:24,776 Stage-1 map = 0%,  reduce = 0%
2019-03-21 14:26:34,428 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.86 sec
MapReduce Total cumulative CPU time: 2 seconds 860 msec
Ended Job = job_1553134178907_0003
Copying data to local directory /home/yao/data/student3
Copying data to local directory /home/yao/data/student3
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.86 sec   HDFS Read: 3409 HDFS Write: 92 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 860 msec
OK
Time taken: 27.62 seconds

[yao@master data]$ ls
student1.txt  student2.txt  student3
[yao@master data]$ cd student3/
[yao@master student3]$ ls
000000_0
[yao@master student3]$ cat 000000_0 
1001shiny23male
1002cendy22male
1003angel23male
1009ella21male
1012eva24male

带分隔符的导出

hive> insert overwrite local directory '/home/yao/data/student' row format delimited fields terminated by ',' select * from student;
Query ID = yao_20190321143106_4f9ce2f6-d1f8-482b-aedf-dae5f55e6b84
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1553134178907_0004, Tracking URL = http://master:18088/proxy/application_1553134178907_0004/
Kill Command = /home/yao/hadoop-2.7.7/bin/hadoop job  -kill job_1553134178907_0004
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-03-21 14:31:20,532 Stage-1 map = 0%,  reduce = 0%
2019-03-21 14:31:29,208 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.98 sec
MapReduce Total cumulative CPU time: 2 seconds 980 msec
Ended Job = job_1553134178907_0004
Copying data to local directory /home/yao/data/student
Copying data to local directory /home/yao/data/student
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.98 sec   HDFS Read: 3199 HDFS Write: 64 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 980 msec
OK
Time taken: 24.758 seconds

[yao@master data]$ ls
student  student1.txt  student2.txt  student3
[yao@master data]$ cd student
[yao@master student]$ ls
000000_0
[yao@master student]$ cat 000000_0 
1005,bob,24
1006,mark,23
1007,leo,22
1011,JACK,23
1014,JAMES,24

导出数据到HDFS

hive> insert overwrite directory '/data/student' row format delimited fields terminated by '\t' select * from student;
Query ID = yao_20190321143628_5dda4ab4-9854-4fff-9174-fab6d05899ef
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1553134178907_0005, Tracking URL = http://master:18088/proxy/application_1553134178907_0005/
Kill Command = /home/yao/hadoop-2.7.7/bin/hadoop job  -kill job_1553134178907_0005
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-03-21 14:36:44,677 Stage-1 map = 0%,  reduce = 0%
2019-03-21 14:36:54,423 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.95 sec
MapReduce Total cumulative CPU time: 2 seconds 950 msec
Ended Job = job_1553134178907_0005
Stage-3 is selected by condition resolver.
Stage-2 is filtered out by condition resolver.
Stage-4 is filtered out by condition resolver.
Moving data to: hdfs://master:9000/data/student/.hive-staging_hive_2019-03-21_14-36-28_119_7183395134216927212-1/-ext-10000
Moving data to: /data/student
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.95 sec   HDFS Read: 3155 HDFS Write: 64 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 950 msec
OK
Time taken: 28.785 seconds
hive> dfs -ls /data/student;
Found 1 items
-rwxr-xr-x   1 yao supergroup         64 2019-03-21 14:36 /data/student/000000_0
hive> dfs -cat /data/student/000000_0;
1005	bob	24
1006	mark	23
1007	leo	22
1011	JACK	23
1014	JAMES	24

3.查询数据

3.1 简单查询

数据

/home/yao/data/stu_messages.txt
005	男	16	北京	13754554587
006 女	17	北京	13872374170
001	男	15	北京	13903030485
002 女	16	北京	17938844884
023	男	15	北京	13700000033
004	男	17	北京	15257575773
003 女	15	北京	15885888889
010 男	16	北京	14098593989
015	女	14	北京	14938983000

/home/yao/data/stu_scores.txt
001	赵四	89	46	90
002	张晓梅	95	89	95
003	小明	89	98	88
004	小舞	89	69	84
005	吴帅帅	94	90	95
006	王晓晓	89	43	89
007	孙强	99	98	97
008	小英	69	85	89
009	王五	89	59	98

创建表并加载数据

hive> create table stu_message(
    > id int,
    > sex string,
    > age int,
    > location string,
    > tel_num string)
    > row format delimited fields terminated by '\t'
    > stored as textfile;
OK
Time taken: 0.074 seconds
hive> create table stu_scores(
    > id int,
    > name string,
    > math int,
    > chinese int,
    > eng int)
    > row format delimited fields terminated by '\t'
    > stored as textfile;
OK
Time taken: 0.107 seconds

hive> load data local inpath '/home/yao/data/stu_messages.txt' overwrite into table stu_message;

hive> load data local inpath '/home/yao/data/stu_scores.txt' overwrite into table stu_scores;

1.某字段的查询

//查询出学生成绩表中学生姓名及语文成绩
hive> select name,chinese from stu_scores;
OK
赵四	46
张晓梅	89
小明	98
小舞	69
吴帅帅	90
王晓晓	43
孙强	98
小英	85
王五	59
Time taken: 0.495 seconds, Fetched: 9 row(s)

//查询出学生信息表中学生的年龄及手机号
hive> select age,tel_num from stu_message;
OK
16	北京
17	北京
15	北京
16	北京
15	北京
17	北京
15	北京
16	北京
14	北京
Time taken: 0.133 seconds, Fetched: 9 row(s)

2.运算查询

//计算出每个学生的总成绩
hive> select name,chinese+math+eng from stu_scores;
OK
赵四	225
张晓梅	279
小明	275
小舞	242
吴帅帅	279
王晓晓	221
孙强	294
小英	243
王五	246
Time taken: 0.222 seconds, Fetched: 9 row(s)

//计算出各科目的平均分
hive> select avg(chinese),avg(math),avg(eng) from stu_scores;
//result
Total MapReduce CPU Time Spent: 8 seconds 60 msec
OK
75.22222222222223	89.11111111111111	91.66666666666667

//查询出各科的最高分
hive> select max(chinese),max(math),max(eng) from stu_scores;
//result
Total MapReduce CPU Time Spent: 5 seconds 320 msec
OK
98	99	98

3.过滤条件

判断查询
格式：if( , , )
第一个逗号里面的内容写的是判断的条件，第二个逗号里面的内容写的是满足这个判断之后执行的内容，第三个逗号里面的内容写的是不满足这个判断之后执行的内容，可以是指定的某个值或者是一个语句

//查询出年龄是15岁的女孩的学号
hive> select if(age=15,id,0) from stu_message;
OK
0
0
1
0
23			//age是15的返回id，不符合条件的返回0
0
3
0
0
Time taken: 0.112 seconds, Fetched: 9 row(s)

过滤条件

//查询出学生成绩表中语文、数学，英语成绩均大于80的学生姓名
hive> select name from stu_scores 
    > where chinese>80 
    > and math >80 
    > and eng>80;
OK
张晓梅
小明
吴帅帅
孙强
Time taken: 0.178 seconds, Fetched: 4 row(s)

//查询出学生信息表中学生的年龄大于15的学生性别
hive> select sex from stu_message where age>15;
OK
男
女
女
男
男
Time taken: 0.076 seconds, Fetched: 5 row(s)

//查询出姓名中带有小字的学生姓名
hive> select name from stu_scores where name like '%小%';
OK
小明
小舞
小英
Time taken: 0.099 seconds, Fetched: 3 row(s)

4.分组

//计算出学生的男女的总数
hive> select sex,count(*) from stu_message group by sex;
//result
Total MapReduce CPU Time Spent: 4 seconds 670 msec
OK
女	4
男	5

//计算出学生中年龄在15岁以上的男女同学的总数
select sex,count(*) from stu_message where age>15 group by sex;

5.排序

order by
distribute by
sort by
cluster by

//按学生姓名，查询出学生总成绩按倒序排列
select name,(chinese+math+eng)as cnt from stu_scores  order by cnt desc;
//result
Total MapReduce CPU Time Spent: 6 seconds 110 msec
OK
孙强	294
吴帅帅	279
张晓梅	279
小明	275
王五	246
小英	243
小舞	242
赵四	225
王晓晓	221
Time taken: 30.053 seconds, Fetched: 9 row(s)

3.2 复杂查询

1.JOIN操作

内连接：join on

//通过内连接，把学生成绩表和学生信息表关联起来，得到学生的姓名
select a.name,a.eng from stu_scores a join stu_message b on a.id=b.id;
//result
Total MapReduce CPU Time Spent: 3 seconds 620 msec
OK
吴帅帅	95
王晓晓	89
赵四	90
张晓梅	95
小舞	84
小明	88

外连接：左外连接右外连接全外连接

//通过外连接，把学生成绩表和学生信息表关联起来，得到学生的姓名和性别
select a.name,b.sex from stu_scores a left join stu_message b on a.id=b.id;
//result
Total MapReduce CPU Time Spent: 2 seconds 110 msec
OK
赵四	男
张晓梅	女
小明	女
小舞	男
吴帅帅	男
王晓晓	女
孙强	NULL
小英	NULL
王五	NULL

左半连接：
通过左半连接，把学生成绩表和学生信息表关联起来，得到学生的姓名

select a.name, id from stu_scores a left semi join stu_message b on a.id=b.id;
//result
Total MapReduce CPU Time Spent: 3 seconds 270 msec
OK
赵四		1
张晓梅	2
小明		3
小舞		4
吴帅帅	5
王晓晓	6

2.UNION ALL
表结构相同
可以在每个连接表的后面加上过滤条件（可以不一样）
可以加过滤条件

select name,'chinese',chinese from stu_scores where chinese in (select max(chinese) from stu_scores)
union all
select name,'math',math from stu_scores where math in (select max(math) from stu_scores)
union all
select name,'eng',eng from stu_scores where eng in (select max(eng) from stu_scores);
//result
Total MapReduce CPU Time Spent: 37 seconds 570 msec
OK
小明	chinese	98
孙强	chinese	98
孙强	math	99
王五	eng	98

3.嵌套查询
格式：

select * from (select * from (select * from tablename)b )a;

如何写？
复杂化的东西简单化
先找出最里层的查询，最后在进行组合

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：k8s Macvlan原理 k8s macvlan ip分配

下一篇：Android RecyclerView禁止上下滑动 android滑动页面

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯