mysql8多值索引 Multi-Valued Indexes.
- 1、多值索引使用示例(引用官方示例)
- 1.1 创建表及多值索引
- 1.2 查询时使用索引
- 2、性能测试
- 2.1、不加索引的情况
- 2.2、添加索引情况
- 2.3 、测试结果
- 3、扩展使用
- 4、字符类型多值索引
- 4.1、字符集验证示例一(utf8mb4_0900_ai_ci)
- 4.1、字符集验证示例二(utf8mb4_0900_as_cs)
多值索引是基于json类型的数组进行设置使用,json 数据类型从msyql5.7就已经支持,而多值索引从mysql8.0.17才开始支持。需要对json数据类型有一定了解,网上有很多例子,这里不做介绍。
以下测试使用的mysql版本为:8.0.22
1、多值索引使用示例(引用官方示例)
官方链接 https://dev.mysql.com/doc/refman/8.0/en/create-index.html#create-index-multi-valued
1.1 创建表及多值索引
数据示例:客户表(customers)有一个用户信息字段 custinfo,为json类型,其中zipcode有多个值,如下:
{
"user":"Bob",
"user_id":31,
"zipcode":[94477,94536]
}
针对zipcode进行索引查询
方式一:创建表时同时创建索引
CREATE TABLE customers (
id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
modified DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
custinfo JSON,
INDEX zips ( ( CAST( custinfo -> '$.zipcode' AS UNSIGNED ARRAY )) )
);
方式二:先创建表,再通过alter table添加索引
CREATE TABLE customers (
id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
modified DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
custinfo JSON
);
ALTER TABLE customers ADD INDEX zips( (CAST(custinfo->'$.zipcode' AS UNSIGNED ARRAY)) );
方式三:先创建表,再通过create index添加索引
CREATE TABLE customers (
id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
modified DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
custinfo JSON
);
CREATE INDEX zips ON customers ( (CAST(custinfo->'$.zipcode' AS UNSIGNED ARRAY)) );
PS:多值索引也可以作为复合索引的一部分,如下:
CREATE TABLE customers (
id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
modified DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
custinfo JSON
);
ALTER TABLE customers ADD INDEX comp(id, modified,
(CAST(custinfo->'$.zipcode' AS UNSIGNED ARRAY)) );
1.2 查询时使用索引
插入数据
mysql> INSERT INTO customers VALUES
-> (NULL, NOW(), '{"user":"Jack","user_id":37,"zipcode":[94582,94536]}'),
-> (NULL, NOW(), '{"user":"Jill","user_id":22,"zipcode":[94568,94507,94582]}'),
-> (NULL, NOW(), '{"user":"Bob","user_id":31,"zipcode":[94477,94507]}'),
-> (NULL, NOW(), '{"user":"Mary","user_id":72,"zipcode":[94536]}'),
-> (NULL, NOW(), '{"user":"Ted","user_id":56,"zipcode":[94507,94582]}');
查询,提供了三个查询函数
MEMBER OF() 数组中是否存在某一值
JSON_CONTAINS() 数组中是否包含某些值,交集(必须都包含)
JSON_OVERLAPS() 数组中是否包含某些值,并集(只要包含一个就行)
mysql> SELECT * FROM customers
-> WHERE 94507 MEMBER OF(custinfo->'$.zipcode');
+----+---------------------+-------------------------------------------------------------------+
| id | modified | custinfo |
+----+---------------------+-------------------------------------------------------------------+
| 2 | 2019-06-29 22:23:12 | {"user": "Jill", "user_id": 22, "zipcode": [94568, 94507, 94582]} |
| 3 | 2019-06-29 22:23:12 | {"user": "Bob", "user_id": 31, "zipcode": [94477, 94507]} |
| 5 | 2019-06-29 22:23:12 | {"user": "Ted", "user_id": 56, "zipcode": [94507, 94582]} |
+----+---------------------+-------------------------------------------------------------------+
3 rows in set (0.00 sec)
mysql> SELECT * FROM customers
-> WHERE JSON_CONTAINS(custinfo->'$.zipcode', CAST('[94507,94582]' AS JSON));
+----+---------------------+-------------------------------------------------------------------+
| id | modified | custinfo |
+----+---------------------+-------------------------------------------------------------------+
| 2 | 2019-06-29 22:23:12 | {"user": "Jill", "user_id": 22, "zipcode": [94568, 94507, 94582]} |
| 5 | 2019-06-29 22:23:12 | {"user": "Ted", "user_id": 56, "zipcode": [94507, 94582]} |
+----+---------------------+-------------------------------------------------------------------+
2 rows in set (0.00 sec)
mysql> SELECT * FROM customers
-> WHERE JSON_OVERLAPS(custinfo->'$.zipcode', CAST('[94507,94582]' AS JSON));
+----+---------------------+-------------------------------------------------------------------+
| id | modified | custinfo |
+----+---------------------+-------------------------------------------------------------------+
| 1 | 2019-06-29 22:23:12 | {"user": "Jack", "user_id": 37, "zipcode": [94582, 94536]} |
| 2 | 2019-06-29 22:23:12 | {"user": "Jill", "user_id": 22, "zipcode": [94568, 94507, 94582]} |
| 3 | 2019-06-29 22:23:12 | {"user": "Bob", "user_id": 31, "zipcode": [94477, 94507]} |
| 5 | 2019-06-29 22:23:12 | {"user": "Ted", "user_id": 56, "zipcode": [94507, 94582]} |
+----+---------------------+-------------------------------------------------------------------+
4 rows in set (0.00 sec)
explain分析上述查询索引使用情况,能看出都是走了索引的
mysql> EXPLAIN SELECT * FROM customers
-> WHERE 94507 MEMBER OF(custinfo->'$.zipcode');
+----+-------------+-----------+------------+------+---------------+------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+------+---------------+------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | customers | NULL | ref | zips | zips | 9 | const | 1 | 100.00 | Using where |
+----+-------------+-----------+------------+------+---------------+------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT * FROM customers
-> WHERE JSON_CONTAINS(custinfo->'$.zipcode', CAST('[94507,94582]' AS JSON));
+----+-------------+-----------+------------+-------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+-------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | customers | NULL | range | zips | zips | 9 | NULL | 6 | 100.00 | Using where |
+----+-------------+-----------+------------+-------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT * FROM customers
-> WHERE JSON_OVERLAPS(custinfo->'$.zipcode', CAST('[94507,94582]' AS JSON));
+----+-------------+-----------+------------+-------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+-------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | customers | NULL | range | zips | zips | 9 | NULL | 6 | 100.00 | Using where |
+----+-------------+-----------+------------+-------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.01 sec)
2、性能测试
2.1、不加索引的情况
创建测试表,不加索引
CREATE TABLE `test_int_array` (
`a` bigint NOT NULL AUTO_INCREMENT,
`b` json NOT NULL,
PRIMARY KEY (`a`)
) ENGINE=InnoDB AUTO_INCREMENT=19264 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
存储过程创建测试数据(10万条数据)
DROP PROCEDURE IF EXISTS proc_initData;
DELIMITER $;
CREATE PROCEDURE proc_initData () BEGIN
DECLARE i INT DEFAULT 1;
DECLARE jsondoc JSON ;
DECLARE CODE INT;
WHILE i <= 1 DO
SET jsondoc = '{"user":"Jack","user_id":37,"type":[1100]}';
SET CODE = CEILING( RAND()* 10 )*100;
SET jsondoc = JSON_SET(jsondoc, "$.type", JSON_ARRAY_INSERT( jsondoc->'$.type', '$[0]', CODE ));
SET CODE = CEILING( RAND()* 10 )*100;
SET jsondoc = JSON_SET(jsondoc, "$.type", JSON_ARRAY_INSERT( jsondoc->'$.type', '$[0]', CODE ));
SET CODE = CEILING( RAND()* 10 )*100;
SET jsondoc = JSON_SET(jsondoc, "$.type", JSON_ARRAY_INSERT( jsondoc->'$.type', '$[0]', CODE ));
SET CODE = CEILING( RAND()* 10 )*100;
SET jsondoc = JSON_SET(jsondoc, "$.type", JSON_ARRAY_INSERT( jsondoc->'$.type', '$[0]', CODE ));
SET CODE = CEILING( RAND()* 10 )*100;
SET jsondoc = JSON_SET(jsondoc, "$.type", JSON_ARRAY_INSERT( jsondoc->'$.type', '$[0]', CODE ));
INSERT INTO test_int_array VALUES ( NULL, jsondoc );
SET i = i + 1;
END WHILE;
END $;
CALL proc_initData ();
查询测试
select count(*) from test_int_array where 500 member of(b->'$.type')
> OK
> Query Time: 0.155s
count(*)
40800
select count(*) from test_int_array where JSON_CONTAINS(b->'$.type', '[500]')
> OK
> Query Time: 0.344s
count(*)
40800
select count(*) from test_int_array where JSON_CONTAINS(b->'$.type', '[500,600,900]')
> OK
> Query Time: 0.435s
count(*)
4095
select count(*) from test_int_array where JSON_OVERLAPS(b->'$.type', '[500]')
> OK
> Query Time: 0.409s
count(*)
40800
select count(*) from test_int_array where JSON_OVERLAPS(b->'$.type', '[500,600,900]')
> OK
> Query Time: 0.453s
count(*)
83504
2.2、添加索引情况
#添加索引
CREATE INDEX types ON test_int_array ( (CAST(b->'$.type' AS UNSIGNED ARRAY)) );
select count(*) from test_int_array where 500 member of(b->'$.type')
> OK
> Query Time: 0.248s
select count(*) from test_int_array where JSON_CONTAINS(b->'$.type', '[500]')
> OK
> Query Time: 0.678s
select count(*) from test_int_array where JSON_CONTAINS(b->'$.type', '[500,600,900]')
> OK
> Query Time: 0.855s
select count(*) from test_int_array where JSON_OVERLAPS(b->'$.type', '[500]')
> OK
> Query Time: 0.764s
select count(*) from test_int_array where JSON_OVERLAPS(b->'$.type', '[500,600,900]')
> OK
> Query Time: 0.81s
2.3 、测试结果
加完索引查询反而慢了…
3、扩展使用
直接把zipcode提出来作为表字段
CREATE TABLE `customers2` (
`id` bigint NOT NULL AUTO_INCREMENT,
`modified` datetime DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`user` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
`user_id` int DEFAULT NULL,
`zipcode` json DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `zips` ((cast(`zipcode` as unsigned array)))
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
INSERT INTO customers2 VALUES
(NULL, NOW(), "Jack",37,'[94582,94536]'),
(NULL, NOW(),"Jill",22, json_array(94568,94507,94582)),
(NULL, NOW(), "Bob",31,'[94477,94507]'),
(NULL, NOW(),"Mary",72, '[94536]'),
(NULL, NOW(), "Ted",56,'[94507,94582]');
SELECT * FROM customers2 WHERE 94507 MEMBER OF(zipcode);
SELECT * FROM customers2 WHERE JSON_CONTAINS(zipcode, CAST('[94507,94582]' AS JSON));
查询计划
explain SELECT * FROM customers2 WHERE 94507 MEMBER OF(zipcode);
explain SELECT * FROM customers2 WHERE JSON_CONTAINS(zipcode, CAST('[94507,94582]' AS JSON));
查询功能和查询结果都可以,但是索引不生效,不知道是索引加的方式有问题还是不支持这种使用方式。
4、字符类型多值索引
以上zipcode是整形的多值索引使用,字符类型多值索引使用有局限性
(摘自官方文档)
Character sets and collations other than the following two combinations of character set and collation are not supported for multi-valued indexes:
a.The binary character set with the default binary collation
b.The utf8mb4 character set with the default utf8mb4_0900_as_cs collation.
多值索引只支持以下两种字符集和排序规则的组合:
a.具有默认binary排序规则的binary字符集。
b.具有默认utf8mb4_0900_as_cs排序规则的utf8mb4字符集。
字符多值索引的创建需要把 unsigned array 改为 char array;
4.1、字符集验证示例一(utf8mb4_0900_ai_ci)
CREATE TABLE `customers3` (
`id` bigint NOT NULL AUTO_INCREMENT,
`modified` datetime DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`custinfo` json DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `zips` ((cast(json_extract(`custinfo`,_utf8mb4'$.zipcode') as char(6) array)))
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
INSERT INTO customers3 VALUES
(NULL, NOW(), '{"user":"Jack","user_id":37,"zipcode":["94582","94536"]}'),
(NULL, NOW(), '{"user":"Jill","user_id":22,"zipcode":["94568","94507","94582"]}'),
(NULL, NOW(), '{"user":"Bob","user_id":31,"zipcode":["94477","94507"]}'),
(NULL, NOW(), '{"user":"Mary","user_id":72,"zipcode":["94536"]}'),
(NULL, NOW(), '{"user":"Ted","user_id":56,"zipcode":["94507","94582"]}');
SELECT * FROM customers3 WHERE "94507" MEMBER OF(custinfo->'$.zipcode');
SELECT * FROM customers3 WHERE JSON_CONTAINS(custinfo->'$.zipcode', CAST('["94507","94582"]' AS JSON));
SELECT * FROM customers3 WHERE JSON_OVERLAPS(custinfo->'$.zipcode', CAST('["94507","94582"]' AS JSON));
4.1、字符集验证示例二(utf8mb4_0900_as_cs)
CREATE TABLE `customers4` (
`id` bigint NOT NULL AUTO_INCREMENT,
`modified` datetime DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`custinfo` json DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `zips` ((cast(json_extract(`custinfo`,_utf8mb4'$.zipcode') as char(6) array)))
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_as_cs;
其他步骤与验证示例一相同
结果:示例一和示例二的结果一样,从执行计划里面看都走了索引,跟字符集排序规则好像没关系。不知道实际的性能有没有区别。