mysql8多值索引 Multi-Valued Indexes.

  • 1、多值索引使用示例(引用官方示例)
  • 1.1 创建表及多值索引
  • 1.2 查询时使用索引
  • 2、性能测试
  • 2.1、不加索引的情况
  • 2.2、添加索引情况
  • 2.3 、测试结果
  • 3、扩展使用
  • 4、字符类型多值索引
  • 4.1、字符集验证示例一(utf8mb4_0900_ai_ci)
  • 4.1、字符集验证示例二(utf8mb4_0900_as_cs)



多值索引是基于json类型的数组进行设置使用,json 数据类型从msyql5.7就已经支持,而多值索引从mysql8.0.17才开始支持。需要对json数据类型有一定了解,网上有很多例子,这里不做介绍。

以下测试使用的mysql版本为:8.0.22

1、多值索引使用示例(引用官方示例)

官方链接 https://dev.mysql.com/doc/refman/8.0/en/create-index.html#create-index-multi-valued

1.1 创建表及多值索引

数据示例:客户表(customers)有一个用户信息字段 custinfo,为json类型,其中zipcode有多个值,如下:

{
    "user":"Bob",
    "user_id":31,
    "zipcode":[94477,94536]
}

针对zipcode进行索引查询

方式一:创建表时同时创建索引

CREATE TABLE customers (
	id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
	modified DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
	custinfo JSON,
	INDEX zips ( ( CAST( custinfo -> '$.zipcode' AS UNSIGNED ARRAY )) ) 
);

方式二:先创建表,再通过alter table添加索引

CREATE TABLE customers (
    id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    modified DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    custinfo JSON
    );

ALTER TABLE customers ADD INDEX zips( (CAST(custinfo->'$.zipcode' AS UNSIGNED ARRAY)) );

方式三:先创建表,再通过create index添加索引

CREATE TABLE customers (
    id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    modified DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    custinfo JSON
    );

CREATE INDEX zips ON customers ( (CAST(custinfo->'$.zipcode' AS UNSIGNED ARRAY)) );

PS:多值索引也可以作为复合索引的一部分,如下:

CREATE TABLE customers (
    id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    modified DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    custinfo JSON
    );

ALTER TABLE customers ADD INDEX comp(id, modified,
    (CAST(custinfo->'$.zipcode' AS UNSIGNED ARRAY)) );

1.2 查询时使用索引

插入数据

mysql> INSERT INTO customers VALUES
    ->     (NULL, NOW(), '{"user":"Jack","user_id":37,"zipcode":[94582,94536]}'),
    ->     (NULL, NOW(), '{"user":"Jill","user_id":22,"zipcode":[94568,94507,94582]}'),
    ->     (NULL, NOW(), '{"user":"Bob","user_id":31,"zipcode":[94477,94507]}'),
    ->     (NULL, NOW(), '{"user":"Mary","user_id":72,"zipcode":[94536]}'),
    ->     (NULL, NOW(), '{"user":"Ted","user_id":56,"zipcode":[94507,94582]}');

查询,提供了三个查询函数

MEMBER OF() 数组中是否存在某一值

JSON_CONTAINS() 数组中是否包含某些值,交集(必须都包含)

JSON_OVERLAPS() 数组中是否包含某些值,并集(只要包含一个就行)

mysql> SELECT * FROM customers
    ->     WHERE 94507 MEMBER OF(custinfo->'$.zipcode');
+----+---------------------+-------------------------------------------------------------------+
| id | modified            | custinfo                                                          |
+----+---------------------+-------------------------------------------------------------------+
|  2 | 2019-06-29 22:23:12 | {"user": "Jill", "user_id": 22, "zipcode": [94568, 94507, 94582]} |
|  3 | 2019-06-29 22:23:12 | {"user": "Bob", "user_id": 31, "zipcode": [94477, 94507]}         |
|  5 | 2019-06-29 22:23:12 | {"user": "Ted", "user_id": 56, "zipcode": [94507, 94582]}         |
+----+---------------------+-------------------------------------------------------------------+
3 rows in set (0.00 sec)

mysql> SELECT * FROM customers
    ->     WHERE JSON_CONTAINS(custinfo->'$.zipcode', CAST('[94507,94582]' AS JSON));
+----+---------------------+-------------------------------------------------------------------+
| id | modified            | custinfo                                                          |
+----+---------------------+-------------------------------------------------------------------+
|  2 | 2019-06-29 22:23:12 | {"user": "Jill", "user_id": 22, "zipcode": [94568, 94507, 94582]} |
|  5 | 2019-06-29 22:23:12 | {"user": "Ted", "user_id": 56, "zipcode": [94507, 94582]}         |
+----+---------------------+-------------------------------------------------------------------+
2 rows in set (0.00 sec)

mysql> SELECT * FROM customers
    ->     WHERE JSON_OVERLAPS(custinfo->'$.zipcode', CAST('[94507,94582]' AS JSON));
+----+---------------------+-------------------------------------------------------------------+
| id | modified            | custinfo                                                          |
+----+---------------------+-------------------------------------------------------------------+
|  1 | 2019-06-29 22:23:12 | {"user": "Jack", "user_id": 37, "zipcode": [94582, 94536]}        |
|  2 | 2019-06-29 22:23:12 | {"user": "Jill", "user_id": 22, "zipcode": [94568, 94507, 94582]} |
|  3 | 2019-06-29 22:23:12 | {"user": "Bob", "user_id": 31, "zipcode": [94477, 94507]}         |
|  5 | 2019-06-29 22:23:12 | {"user": "Ted", "user_id": 56, "zipcode": [94507, 94582]}         |
+----+---------------------+-------------------------------------------------------------------+
4 rows in set (0.00 sec)

explain分析上述查询索引使用情况,能看出都是走了索引的

mysql> EXPLAIN SELECT * FROM customers
    ->     WHERE 94507 MEMBER OF(custinfo->'$.zipcode');
+----+-------------+-----------+------------+------+---------------+------+---------+-------+------+----------+-------------+
| id | select_type | table     | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-----------+------------+------+---------------+------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | customers | NULL       | ref  | zips          | zips | 9       | const |    1 |   100.00 | Using where |
+----+-------------+-----------+------------+------+---------------+------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> EXPLAIN SELECT * FROM customers
    ->     WHERE JSON_CONTAINS(custinfo->'$.zipcode', CAST('[94507,94582]' AS JSON));
+----+-------------+-----------+------------+-------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table     | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-----------+------------+-------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | customers | NULL       | range | zips          | zips | 9       | NULL |    6 |   100.00 | Using where |
+----+-------------+-----------+------------+-------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> EXPLAIN SELECT * FROM customers
    ->     WHERE JSON_OVERLAPS(custinfo->'$.zipcode', CAST('[94507,94582]' AS JSON));
+----+-------------+-----------+------------+-------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table     | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-----------+------------+-------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | customers | NULL       | range | zips          | zips | 9       | NULL |    6 |   100.00 | Using where |
+----+-------------+-----------+------------+-------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.01 sec)

2、性能测试

2.1、不加索引的情况

创建测试表,不加索引

CREATE TABLE `test_int_array` (
  `a` bigint NOT NULL AUTO_INCREMENT,
  `b` json NOT NULL,
  PRIMARY KEY (`a`)
) ENGINE=InnoDB AUTO_INCREMENT=19264 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

存储过程创建测试数据(10万条数据)

DROP PROCEDURE IF EXISTS proc_initData;
DELIMITER $;
CREATE PROCEDURE proc_initData () BEGIN
	DECLARE	i INT DEFAULT 1;
	DECLARE jsondoc JSON ;
	DECLARE CODE INT;
	WHILE i <= 1 DO
			
		SET jsondoc = '{"user":"Jack","user_id":37,"type":[1100]}';
		
		SET CODE = CEILING( RAND()* 10 )*100;
		SET jsondoc = JSON_SET(jsondoc, "$.type", JSON_ARRAY_INSERT( jsondoc->'$.type', '$[0]', CODE ));
		
		SET CODE = CEILING( RAND()* 10 )*100;
		SET jsondoc = JSON_SET(jsondoc, "$.type", JSON_ARRAY_INSERT( jsondoc->'$.type', '$[0]', CODE ));
		
		SET CODE = CEILING( RAND()* 10 )*100;
		SET jsondoc = JSON_SET(jsondoc, "$.type", JSON_ARRAY_INSERT( jsondoc->'$.type', '$[0]', CODE ));
		
		SET CODE = CEILING( RAND()* 10 )*100;
		SET jsondoc = JSON_SET(jsondoc, "$.type", JSON_ARRAY_INSERT( jsondoc->'$.type', '$[0]', CODE ));
		
		SET CODE = CEILING( RAND()* 10 )*100;
		SET jsondoc = JSON_SET(jsondoc, "$.type", JSON_ARRAY_INSERT( jsondoc->'$.type', '$[0]', CODE ));
		INSERT INTO test_int_array	VALUES	( NULL, jsondoc );

		SET i = i + 1;
	END WHILE;
END $;
CALL proc_initData ();

查询测试

select count(*) from test_int_array where 500 member of(b->'$.type')
> OK
> Query Time: 0.155s
count(*)
40800

select count(*) from test_int_array where JSON_CONTAINS(b->'$.type', '[500]')
> OK
> Query Time: 0.344s
count(*)
40800

select count(*) from test_int_array where JSON_CONTAINS(b->'$.type', '[500,600,900]')
> OK
> Query Time: 0.435s
count(*)
4095

select count(*) from test_int_array where JSON_OVERLAPS(b->'$.type', '[500]')
> OK
> Query Time: 0.409s
count(*)
40800

select count(*) from test_int_array where JSON_OVERLAPS(b->'$.type', '[500,600,900]')
> OK
> Query Time: 0.453s
count(*)
83504

2.2、添加索引情况

#添加索引
CREATE INDEX types ON test_int_array ( (CAST(b->'$.type' AS UNSIGNED ARRAY)) );

select count(*) from test_int_array where 500 member of(b->'$.type')
> OK
> Query Time: 0.248s

select count(*) from test_int_array where JSON_CONTAINS(b->'$.type', '[500]')
> OK
> Query Time: 0.678s

select count(*) from test_int_array where JSON_CONTAINS(b->'$.type', '[500,600,900]')
> OK
> Query Time: 0.855s

select count(*) from test_int_array where JSON_OVERLAPS(b->'$.type', '[500]')
> OK
> Query Time: 0.764s

select count(*) from test_int_array where JSON_OVERLAPS(b->'$.type', '[500,600,900]')
> OK
> Query Time: 0.81s

2.3 、测试结果

加完索引查询反而慢了…

3、扩展使用

直接把zipcode提出来作为表字段

CREATE TABLE `customers2` (
  `id` bigint NOT NULL AUTO_INCREMENT,
  `modified` datetime DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `user` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `user_id` int DEFAULT NULL,
  `zipcode` json DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `zips` ((cast(`zipcode` as unsigned array)))
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

INSERT INTO customers2 VALUES
       (NULL, NOW(), "Jack",37,'[94582,94536]'),
       (NULL, NOW(),"Jill",22, json_array(94568,94507,94582)),
       (NULL, NOW(), "Bob",31,'[94477,94507]'),
       (NULL, NOW(),"Mary",72, '[94536]'),
       (NULL, NOW(), "Ted",56,'[94507,94582]');

SELECT * FROM customers2 WHERE 94507 MEMBER OF(zipcode);

MySQL8 json检索 mysql8 json索引_database

SELECT * FROM customers2 WHERE JSON_CONTAINS(zipcode, CAST('[94507,94582]' AS JSON));

MySQL8 json检索 mysql8 json索引_MySQL8 json检索_02

查询计划 
explain SELECT * FROM customers2 WHERE 94507 MEMBER OF(zipcode);

MySQL8 json检索 mysql8 json索引_MySQL8 json检索_03

explain SELECT * FROM customers2 WHERE JSON_CONTAINS(zipcode, CAST('[94507,94582]' AS JSON));

MySQL8 json检索 mysql8 json索引_MySQL8 json检索_04


查询功能和查询结果都可以,但是索引不生效,不知道是索引加的方式有问题还是不支持这种使用方式。

4、字符类型多值索引

以上zipcode是整形的多值索引使用,字符类型多值索引使用有局限性

(摘自官方文档)
Character sets and collations other than the following two combinations of character set and collation are not supported for multi-valued indexes:
	a.The binary character set with the default binary collation
	b.The utf8mb4 character set with the default utf8mb4_0900_as_cs collation.

多值索引只支持以下两种字符集和排序规则的组合:
	a.具有默认binary排序规则的binary字符集。
	b.具有默认utf8mb4_0900_as_cs排序规则的utf8mb4字符集。

字符多值索引的创建需要把 unsigned array 改为 char array;

4.1、字符集验证示例一(utf8mb4_0900_ai_ci)

CREATE TABLE `customers3` (
  `id` bigint NOT NULL AUTO_INCREMENT,
  `modified` datetime DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `custinfo` json DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `zips` ((cast(json_extract(`custinfo`,_utf8mb4'$.zipcode') as char(6) array)))
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

INSERT INTO customers3 VALUES
       (NULL, NOW(), '{"user":"Jack","user_id":37,"zipcode":["94582","94536"]}'),
       (NULL, NOW(), '{"user":"Jill","user_id":22,"zipcode":["94568","94507","94582"]}'),
       (NULL, NOW(), '{"user":"Bob","user_id":31,"zipcode":["94477","94507"]}'),
       (NULL, NOW(), '{"user":"Mary","user_id":72,"zipcode":["94536"]}'),
       (NULL, NOW(), '{"user":"Ted","user_id":56,"zipcode":["94507","94582"]}');

SELECT * FROM customers3 WHERE "94507" MEMBER OF(custinfo->'$.zipcode');

MySQL8 json检索 mysql8 json索引_数据库_05


MySQL8 json检索 mysql8 json索引_MySQL8 json检索_06

SELECT * FROM customers3 WHERE JSON_CONTAINS(custinfo->'$.zipcode', CAST('["94507","94582"]' AS JSON));

MySQL8 json检索 mysql8 json索引_JSON_07


MySQL8 json检索 mysql8 json索引_数据库_08

SELECT * FROM customers3 WHERE JSON_OVERLAPS(custinfo->'$.zipcode', CAST('["94507","94582"]' AS JSON));

MySQL8 json检索 mysql8 json索引_JSON_09


MySQL8 json检索 mysql8 json索引_JSON_10

4.1、字符集验证示例二(utf8mb4_0900_as_cs)

CREATE TABLE `customers4` (
  `id` bigint NOT NULL AUTO_INCREMENT,
  `modified` datetime DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `custinfo` json DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `zips` ((cast(json_extract(`custinfo`,_utf8mb4'$.zipcode') as char(6) array)))
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_as_cs;

其他步骤与验证示例一相同

结果:示例一和示例二的结果一样,从执行计划里面看都走了索引,跟字符集排序规则好像没关系。不知道实际的性能有没有区别。