概念
因为我们使用的是关系型数据库,每张表表示的都是独立的单元(对象),而该单元(对象)所涉及到的其他信息通常都存储在其他表中,例如:
MariaDB [world]> DESC city;
+-------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+----------+------+-----+---------+----------------+
| ID | int(11) | NO | PRI | NULL | auto_increment |
| Name | char(35) | NO | | | |
| CountryCode | char(3) | NO | MUL | | |
| District | char(20) | NO | | | |
| Population | int(11) | NO | | 0 | |
+-------------+----------+------+-----+---------+----------------+
5 rows in set (0.61 sec)
MariaDB [world]> DESC countrylanguage;
+-------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+-------+
| CountryCode | char(3) | NO | PRI | | |
| Language | char(30) | NO | PRI | | |
| IsOfficial | enum('T','F') | NO | | F | |
| Percentage | float(4,1) | NO | | 0.0 | |
+-------------+---------------+------+-----+---------+-------+
4 rows in set (0.06 sec)
比如其上两张表,我们想知道某一城市所使用的语言,就可以分为两个步骤:
1.在City表中查询该城市的CountryCode。
2.使用查询到的这个CountryCode在CountryLanguage表中查询该国家所使用的语言。
虽然,可以分两步完成,但是,需要两次查询和两次传输,在带宽和性能的对比下,我们更希望让Mysql(MariaDB)来帮助我们完成这件事不是吗?
连接(JOIN):也叫连结,是指将两张表按照一定规则连成一张表,将两张表中不同的数据(行)连成一行来看待。
又可以将连接分为如下几类:
- 内连接
- 外连接
- 左外连接
- 右外连接
- 交叉连接
在连接查询中,一个列可能出现在多张表中,为了避免引起歧义,通常在列名前面加上表名或表别名作为前缀(例:s.sid、x.sid)---使用表别名作为前缀,可以使得SQL代码较短,使用的内存更少(例:stu s,xuanke as x)。
内连接
内连接语法如下:
SELECT tb1_name.column,tb2_name.column FROM tb1 INNER JOIN tb2 ON 约束条件;
SELECT tb1_name.column,tb2_name.column FROM tb1,tb2 WHERE 约束条件;
查询每一个城市可能使用的语言有哪些:
MariaDB [world]> SELECT Name,District,Language FROM city,countrylanguage WHERE city.CountryCode = countrylanguage.CountryCode LIMIT 10;
+----------+----------+------------+
| Name | District | Language |
+----------+----------+------------+
| Kabul | Kabol | Balochi |
| Kabul | Kabol | Dari |
| Kabul | Kabol | Pashto |
| Kabul | Kabol | Turkmenian |
| Kabul | Kabol | Uzbek |
| Qandahar | Qandahar | Balochi |
| Qandahar | Qandahar | Dari |
| Qandahar | Qandahar | Pashto |
| Qandahar | Qandahar | Turkmenian |
| Qandahar | Qandahar | Uzbek |
+----------+----------+------------+
10 rows in set (0.00 sec)
内连接是怎样工作的
我们来看一下,这些数据是怎么连接起来的,具体可以看如下这张图(放大看):
所以所谓内连接,就是仅将多表中符合条件的行进行连接且返回结果。
比如这样,就将三张表连接了起来:
MariaDB [world]> SELECT * FROM city INNER JOIN countrylanguage INNER JOIN country ON city.CountryCode = countrylanguage.CountryCode AND city.CountryCode = country.Code WHERE city.Name='Kabul'\G;
*************************** 1. row ***************************
ID: 1
Name: Kabul
CountryCode: AFG
District: Kabol
Population: 1780000
CountryCode: AFG
Language: Balochi
IsOfficial: F
Percentage: 0.9
Code: AFG
Name: Afghanistan
Continent: Asia
Region: Southern and Central Asia
SurfaceArea: 652090.00
IndepYear: 1919
Population: 22720000
LifeExpectancy: 45.9
GNP: 5976.00
GNPOld: NULL
LocalName: Afganistan/Afqanestan
GovernmentForm: Islamic Emirate
HeadOfState: Mohammad Omar
Capital: 1
Code2: AF
....仅截取了第一条记录
5 rows in set (0.01 sec)
这里比较推荐SQL的标准写法,也就是如下格式:
SELECT tb1_name.column,tb2_name.column FROM tb1 INNER JOIN tb2 ON 约束条件;
为什么呢?因为在ON子句后还可以跟WHERE子句多连接出来的表进行过滤呀,且此语法结构更清晰不是吗?
外连接
使用内连接会将多表中符合条件的行连接到一起,而不符合条件的行则忽略,而外连接则会将一些不符合条件的行也输出出来。
例如,我们有如下数据:
MariaDB [world]> SELECT * FROM user;
+----+-------+----------+---------------------+--------+
| id | name | password | regtime | deptid |
+----+-------+----------+---------------------+--------+
| 1 | test | test | 2018-03-05 17:25:26 | 1 |
| 2 | test1 | test1 | 2018-03-05 17:25:26 | 1 |
| 3 | lucy | lucy | 2018-03-05 17:25:26 | 2 |
| 4 | mars | mars | 2018-03-05 17:25:26 | 3 |
| 5 | mark | mark | 2018-03-05 17:26:05 | NULL |
+----+-------+----------+---------------------+--------+
5 rows in set (0.01 sec)
MariaDB [world]> SELECT * FROM department;
+----+------------+---------+----------+
| id | name | comment | adminids |
+----+------------+---------+----------+
| 1 | Sales | NULL | NULL |
| 2 | Tech | NULL | NULL |
| 3 | administra | NULL | NULL |
| 4 | Secretaria | NULL | NULL |
+----+------------+---------+----------+
4 rows in set (0.01 sec)
//其中deptid是用户所属部门的编号
我们有如下需求,显示用户及用户所在部门名称,根据我们上面所说的内连接,我们可以写出如下语句:
MariaDB [world]> SELECT user.id,user.name,department.name FROM user INNER JOIN department ON user.deptid = department.id;
+----+-------+------------+
| id | name | name |
+----+-------+------------+
| 1 | test | Sales |
| 2 | test1 | Sales |
| 3 | lucy | Tech |
| 4 | mars | administra |
+----+-------+------------+
4 rows in set (0.14 sec)
但是,结果对吗?虽说我们的mark先生还没有被分到任何部门,但是也不能不显示人家了吧?
这时候,外连接就派上用场了:
在JOIN左面的表叫左表,而在右面的表叫右表
左外连接,FROM tb1_name LEFT OUTER JOIN tb2_name
除将符合条件的行显示出来,还显示左表的全部行,而右表的字段拼接过去全为NULL。如下所示:
MariaDB [world]> SELECT * FROM user LEFT OUTER JOIN department ON user.deptid = department.id;
+----+-------+----------+---------------------+--------+------+------------+---------+----------+
| id | name | password | regtime | deptid | id | name | comment | adminids |
+----+-------+----------+---------------------+--------+------+------------+---------+----------+
| 1 | test | test | 2018-03-05 17:25:26 | 1 | 1 | Sales | NULL | NULL |
| 2 | test1 | test1 | 2018-03-05 17:25:26 | 1 | 1 | Sales | NULL | NULL |
| 3 | lucy | lucy | 2018-03-05 17:25:26 | 2 | 2 | Tech | NULL | NULL |
| 4 | mars | mars | 2018-03-05 17:25:26 | 3 | 3 | administra | NULL | NULL |
| 5 | mark | mark | 2018-03-05 17:26:05 | NULL | NULL | NULL | NULL | NULL |
+----+-------+----------+---------------------+--------+------+------------+---------+----------+
5 rows in set (0.00 sec)
右外连接,FROM tb1_name RIGHT OUTER JOIN tb2_name
顾名思义,就是显示右表的所有行,而未符合连接条件的行,左表字段全为NULL,如下所示:
MariaDB [world]> SELECT * FROM user RIGHT OUTER JOIN department ON user.deptid = department.id;
+------+-------+----------+---------------------+--------+----+------------+---------+----------+
| id | name | password | regtime | deptid | id | name | comment | adminids |
+------+-------+----------+---------------------+--------+----+------------+---------+----------+
| 1 | test | test | 2018-03-05 17:25:26 | 1 | 1 | Sales | NULL | NULL |
| 2 | test1 | test1 | 2018-03-05 17:25:26 | 1 | 1 | Sales | NULL | NULL |
| 3 | lucy | lucy | 2018-03-05 17:25:26 | 2 | 2 | Tech | NULL | NULL |
| 4 | mars | mars | 2018-03-05 17:25:26 | 3 | 3 | administra | NULL | NULL |
| NULL | NULL | NULL | NULL | NULL | 4 | Secretaria | NULL | NULL |
+------+-------+----------+---------------------+--------+----+------------+---------+----------+
5 rows in set (0.00 sec)
交叉连接与笛卡尔积
当没有连接条件的表进行连接的结果为笛卡儿积,检索出的行的数目将是第一个表中的行数乘以第二个表中的行数,如下图所示:
如果有使用笛卡尔积的必要时,可以使用交叉连接(CROSS JOIN)如下例所示:
MariaDB [world]> SELECT user.Name,department.name FROM user CROSS JOIN department;
+-------+------------+
| Name | name |
+-------+------------+
| test | Sales |
| test | Tech |
| test | administra |
| test | Secretaria |
| test1 | Sales |
| test1 | Tech |
| test1 | administra |
| test1 | Secretaria |
| lucy | Sales |
| lucy | Tech |
| lucy | administra |
| lucy | Secretaria |
| mars | Sales |
| mars | Tech |
| mars | administra |
| mars | Secretaria |
| mark | Sales |
| mark | Tech |
| mark | administra |
| mark | Secretaria |
+-------+------------+
20 rows in set (0.00 sec)
多表连接的条件过滤
当我们的想要过滤多表连接查询结果时,我们可以将过滤条件放在ON子句或者WHERE子句,ON子句和WHERE子句得到的结果可能会不太一样。
过滤条件放ON子句:使用AND逻辑与操作将过滤条件放在连接条件前或后->在连接前进行条件过滤。
过滤条件放WHERE子句:使用单独的WHERE子句进行数据过滤->在连接后进行条件过滤。
对于内连接而言,过滤条件放在ON子句或WHERE子句是相同的,比较推荐在ON子句过滤。
而对于外连接而言,有以下情况参考:
//过滤条件放连接条件前或后
MariaDB [world]> SELECT user.name,department.name FROM user LEFT OUTER JOIN department ON user.name='mars' AND user.deptid = department.id;
MariaDB [world]> SELECT user.name,department.name FROM user LEFT OUTER JOIN department ON user.deptid = department.id AND user.name='mars';
+-------+------------+
| name | name |
+-------+------------+
| test | NULL |
| test1 | NULL |
| lucy | NULL |
| mars | administra |
| mark | NULL |
+-------+------------+
5 rows in set (0.00 sec)
//因为ON user.name='mars'会将左表变为一条数据,但AND要求第二个表达式也为真,user.deptid = department.id;这条又仅过滤了mars的deptid和其部门表中对应的id,但左连接又要求左表显示所有数据,所以右表字段为NULL
//过滤条件放WHERE子句,因为是连接后进行过滤,就是说对连接生成的这个新表过滤,所以只会显示符合条件的这条数据。
MariaDB [world]> SELECT user.name,department.name FROM user LEFT OUTER JOIN department ON user.deptid = department.id WHERE user.name = 'mars';
+------+------------+
| name | name |
+------+------------+
| mars | administra |
+------+------------+
1 row in set (0.00 sec)
表的重命名
在多表连接查询时,通常会对表进行重命名操作,与列的重命名一样使用AS关键字,对表重命名主要是引用表时使用方便。
如下所示,对user表重命名为U,对department重命名为D:
MariaDB [world]> SELECT U.name,D.name FROM user AS U LEFT OUTER JOIN department AS D ON
U.deptid = D.id;
+-------+------------+
| name | name |
+-------+------------+
| test | Sales |
| test1 | Sales |
| lucy | Tech |
| mars | administra |
| mark | NULL |
+-------+------------+
5 rows in set (0.01 sec)
多表连接与聚合函数的使用
多表连接查询说白了就是产生一张临时的新表,所以使用分组和聚合函数就像平常一样简单,参考如下例子:
统计每个部门的人数:
MariaDB [world]> SELECT D.name,COUNT(U.name) FROM user AS U LEFT OUTER JOIN department AS D ON U.deptid = D.id GROUP BY D.name;
+------------+---------------+
| name | COUNT(U.name) |
+------------+---------------+
| NULL | 1 |
| administra | 1 |
| Sales | 2 |
| Tech | 1 |
+------------+---------------+
4 rows in set (0.00 sec)
统计每个城市所能说的官方语言的数量:
MariaDB [world]> SELECT C.Name,COUNT(CL.Language) FROM city AS C INNER JOIN countrylanguage AS CL ON C.CountryCode = CL.CountryCode AND CL.IsOfficial = 'T' GROUP BY C.Name;
+-------------------------+--------------------+
| Name | COUNT(CL.Language) |
+-------------------------+--------------------+
| A Coruña (La Coruña) | 1 |
| Aachen | 1 |
................................................
| Alicante [Alacant] | 1 |
| Aligarh | 1 |
+-------------------------+--------------------+