MySQL在数据量很大时候,进行like %%查询会比较慢,所以可以使用MySQL全文检索。

建表

CREATE TABLE `training`.`t` (
  `id` INT NOT NULL,
  `content` LONGTEXT NOT NULL,
  PRIMARY KEY (`id`));

INSERT INTO `training`.`t` (`id`, `content`) VALUES ('1', '我爱');
INSERT INTO `training`.`t` (`id`, `content`) VALUES ('2', '我京');
INSERT INTO `training`.`t` (`id`, `content`) VALUES ('3', '北京');
INSERT INTO `training`.`t` (`id`, `content`) VALUES ('4', '爱北');
INSERT INTO `training`.`t` (`id`, `content`) VALUES ('5', '我');
INSERT INTO `training`.`t` (`id`, `content`) VALUES ('6', '爱');
INSERT INTO `training`.`t` (`id`, `content`) VALUES ('7', '北');
INSERT INTO `training`.`t` (`id`, `content`) VALUES ('8', '京');
INSERT INTO `training`.`t` (`id`, `content`) VALUES ('9', '京我');
INSERT INTO `training`.`t` (`id`, `content`) VALUES ('10', '我爱北');
INSERT INTO `training`.`t` (`id`, `content`) VALUES ('11', '我爱北京');

如果是英文可以选择默认的分词器:

ALTER TABLE `t`  ADD FULLTEXT INDEX `idx`(`content`) 

如果是中文可以选择ngram分词器:

ALTER TABLE `t`  ADD FULLTEXT INDEX `idx`(`content`) WITH PARSER ngram;

ngram分词器默认2个中文分词,也就是"我爱北京"会被分词为"我爱","爱北","北京",所以查询时候也必须按照分词后的词语进行查询,可以调整ngram个数来优化分词:

show variables like 'ngram_token_size'; 
SET  ngram_token_size = 1;

但是无法修改,报错:

Error Code: 1238. Variable 'ngram_token_size' is a read only variable

所以必须在MySQL服务器启动配置中修改这个变量。

查询

select * from t where MATCH (content ) AGAINST ('我'); // 0条
select * from t where MATCH (content ) AGAINST ('我爱');// 3条,1,10,11
select * from t where MATCH (content ) AGAINST ('北京');// 2条,3,11
select * from t where MATCH (content ) AGAINST ('爱北'); // 3条,4,10,11
select * from t where MATCH (content ) AGAINST ('我爱北'); // 2条,10,11


可以看出查询结果是严格按照分两个中文词来匹配的,即使数据库有ID=5 ”我“,检索”我“也查不出来,此时需要用like

select * from t where content  like '%我%'; // 6条 1,2,5,9,10,11

所以在需要一个中文查询时候,就必须将ngram改成1.

参考

https://dev.mysql.com/doc/refman/8.3/en/fulltext-search-ngram.html