第18章 全文搜索
搜索引擎:MyISAM支持全文本搜索(InnoDB不支持)
- 使用LIKE关键字或者正则表达式要求MySQL匹配表中所有行,随着行数增加,会很耗时,且不容易明确匹配什么不匹配什么,例如指定一个词必须匹配,一个词必须不匹配且结果不够智能化,如不会区分单个匹配的行和多个匹配的行(一行中匹配同一个行多次)
这些问题可以用全文本搜索解决,此时MySQL不需要分别查看每个行,也不需要分别分析和处理每个词
- 全文本搜索的前提条件,必须索引被搜索的列,在CREATE TABLE时,MySQL根据子句FULLTEXT(note_text)的指示对该列进行索引,如果需要也可以指定多个列,可以在创建表时指定FULLTEXT,或者稍后,
- 不要在导入数据时使用FULLTEXT
match()指定被搜索的列
against()指定要使用的搜索表达式
1.简单的全文本搜索 match against
#全文本搜索
mysql> select note_text from productnotes where match(note_text) against('rabbit');
+---------------------------------------------------------------------------------------------------------------------+
| note_text |
+---------------------------------------------------------------------------------------------------------------------+
| Customer complaint: rabbit has been able to detect trap, food apparently less effective now. |
| Quantity varies, sold by the sack load.
All guaranteed to be bright and orange, and suitable for use as rabbit bait. |
+---------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.02 sec)
#使用通配符进行查询
mysql> select note_text from productnotes where note_text like '%rabbit%';
+---------------------------------------------------------------------------------------------------------------------+
| note_text |
+---------------------------------------------------------------------------------------------------------------------+
| Quantity varies, sold by the sack load.
All guaranteed to be bright and orange, and suitable for use as rabbit bait. |
| Customer complaint: rabbit has been able to detect trap, food apparently less effective now. |
+---------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.02 sec)
2.查询扩展 with query expansion
mysql> select note_text from productnotes where match(note_text) against('anvils' with query expansion);
+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| note_text |
+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| Multiple customer returns, anvils failing to drop fast enough or falling backwards on purchaser. Recommend that customer considers using heavier anvils. |
| Customer complaint:
Sticks not individually wrapped, too easy to mistakenly detonate all at once.
Recommend individual wrapping. |
| Customer complaint:
Not heavy enough to generate flying stars around head of victim. If being purchased for dropping, recommend ANV02 or ANV03 instead. |
| Please note that no returns will be accepted if safe opened using explosives. |
| Customer complaint: rabbit has been able to detect trap, food apparently less effective now. |
| Customer complaint:
Circular hole in safe floor can apparently be easily cut with handsaw. |
| Matches not included, recommend purchase of matches or detonator (item DTNTR). |
+----------------------------------------------------------------------------------------------------------------------------------------------------------+
7 rows in set (0.03 sec)
# 查询扩展会搜索所有记录(相关性查询)
#查询database,同时也会把含有mysql但不含database的记录也找到。
#此功能会带来诸多非相关性的查询(谨慎使用)
3.布尔文本搜索 in boolean mode
布尔操作符 | 说明 |
+ | 包含,词必须存在 |
- | 排除,词必须不出现 |
> | 包含,而且增加等级值 |
< | 包含,且减少等级值 |
() | 把词组成子表达式(允许这些表达式作为一个组被包含、排除、排列等) |
~ | 取消一个词的排序值 |
* | 词尾的通配符 |
“ ” | 定义一个短语(与单个词的列表不一样,它匹配整个短语一边包含或排除这个短语) |
–关键字 in boolean mode
#全文本搜索包含此heavy的所有行
mysql> select note_text from productnotes where match(note_text) against('heavy' in boolean mode);
#全文本搜索heavy ,排除rope(以rope开头的词)的行
mysql> select note_text from productnotes where match(note_text) against('heavy -rope*' in boolean mode);
#匹配包含词rabbit和bait的行
mysql> select note_text from productnotes where match(note_text) against('+rabbit +bait' in boolean mode);
#搜索匹配短语rabbit bait而不是匹配两个词rabbit和bait
mysql> select note_text from productnotes where match(note_text) against('"rabbit bait"' in boolean mode);
#匹配rabbit和carrot,增加前者的等级,降低后者的等级
mysql> select note_text from productnotes where match(note_text) against('>rabbit <carrot' in boolean mode);
#必须匹配词safe和combination,降低后者的等级
mysql> select note_text from productnotes where match(note_text) against('>rabbit +(<combination)' in boolean mode);
#排列而不排序:在布尔方式中,不按等级值降序排序返回的行
演示排序
rank在mysql 8.0版本中数据保留字,使用时必须用’'括起来或者更改名字;
#普通全文本搜索
mysql> select note_text from productnotes where match(note_text) against('rabbit');
+---------------------------------------------------------------------------------------------------------------------+
| note_text |
+---------------------------------------------------------------------------------------------------------------------+
| Customer complaint: rabbit has been able to detect trap, food apparently less effective now. |
| Quantity varies, sold by the sack load.
All guaranteed to be bright and orange, and suitable for use as rabbit bait. |
+---------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.02 sec)
#演示排序工作
mysql> select note_text,match(note_text) against('rabbit') as 'rank' from productnotes;
+----------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+
| note_text | rank |
+----------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+
| Customer complaint:
Sticks not individually wrapped, too easy to mistakenly detonate all at once.
Recommend individual wrapping. | 0 |
| Can shipped full, refills not available.
Need to order new can if refill needed. | 0 |
| Safe is combination locked, combination not provided with safe.
This is rarely a problem as safes are typically blown up or dropped by customers. | 0 |
| Quantity varies, sold by the sack load.
All guaranteed to be bright and orange, and suitable for use as rabbit bait. | 1.5905543565750122 |
| Included fuses are short and have been known to detonate too quickly for some customers.
Longer fuses are available (item FU1) and should be recommended. | 0 |
| Matches not included, recommend purchase of matches or detonator (item DTNTR). | 0 |
| Please note that no returns will be accepted if safe opened using explosives. | 0 |
| Multiple customer returns, anvils failing to drop fast enough or falling backwards on purchaser. Recommend that customer considers using heavier anvils. | 0 |
| Item is extremely heavy. Designed for dropping, not recommended for use with slings, ropes, pulleys, or tightropes. | 0 |
| Customer complaint: rabbit has been able to detect trap, food apparently less effective now. | 1.6408053636550903 |
| Shipped unassembled, requires common tools (including oversized hammer). | 0 |
| Customer complaint:
Circular hole in safe floor can apparently be easily cut with handsaw. | 0 |
| Customer complaint:
Not heavy enough to generate flying stars around head of victim. If being purchased for dropping, recommend ANV02 or ANV03 instead. | 0 |
| Call from individual trapped in safe plummeting to the ground, suggests an escape hatch be added.
Comment forwarded to vendor. | 0 |
+----------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+
14 rows in set (0.03 sec)
#rank中大于1的就是全文搜索匹配的行
注意事项
- 在索引全文本数据时,短词被忽略且从索引中排除,短词定义为3个即以下字符的词
- MySQL有一个内建的非用词列表,这些词在索引全文本数据时总是被忽略,有需要可以覆盖
- 如果一个词出现在50%以上的行中,则将它作为一个非用词忽略,这个规则不适用于IN BOOLEAN MODE
- 表中行数<=2行,全文本搜索不返回结果,因为50%的规则
- 忽略单引号 如don’t索引为dont