第18章 全文搜索

搜索引擎:MyISAM支持全文本搜索(InnoDB不支持)

  • 使用LIKE关键字或者正则表达式要求MySQL匹配表中所有行,随着行数增加,会很耗时,且不容易明确匹配什么不匹配什么,例如指定一个词必须匹配,一个词必须不匹配且结果不够智能化,如不会区分单个匹配的行和多个匹配的行(一行中匹配同一个行多次)

这些问题可以用全文本搜索解决,此时MySQL不需要分别查看每个行,也不需要分别分析和处理每个词

  • 全文本搜索的前提条件,必须索引被搜索的列,在CREATE TABLE时,MySQL根据子句FULLTEXT(note_text)的指示对该列进行索引,如果需要也可以指定多个列,可以在创建表时指定FULLTEXT,或者稍后,
  • 不要在导入数据时使用FULLTEXT

match()指定被搜索的列
against()指定要使用的搜索表达式

1.简单的全文本搜索 match against

#全文本搜索
mysql> select note_text from productnotes where match(note_text) against('rabbit');
+---------------------------------------------------------------------------------------------------------------------+
| note_text                                                                                                           |
+---------------------------------------------------------------------------------------------------------------------+
| Customer complaint: rabbit has been able to detect trap, food apparently less effective now.                        |
| Quantity varies, sold by the sack load.
All guaranteed to be bright and orange, and suitable for use as rabbit bait. |
+---------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.02 sec)


#使用通配符进行查询

mysql> select note_text from productnotes where note_text like '%rabbit%';
+---------------------------------------------------------------------------------------------------------------------+
| note_text                                                                                                           |
+---------------------------------------------------------------------------------------------------------------------+
| Quantity varies, sold by the sack load.
All guaranteed to be bright and orange, and suitable for use as rabbit bait. |
| Customer complaint: rabbit has been able to detect trap, food apparently less effective now.                        |
+---------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.02 sec)

2.查询扩展 with query expansion

mysql> select note_text from productnotes where match(note_text) against('anvils' with query expansion);
+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| note_text                                                                                                                                                |
+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| Multiple customer returns, anvils failing to drop fast enough or falling backwards on purchaser. Recommend that customer considers using heavier anvils. |
| Customer complaint:
Sticks not individually wrapped, too easy to mistakenly detonate all at once.
Recommend individual wrapping.                           |
| Customer complaint:
Not heavy enough to generate flying stars around head of victim. If being purchased for dropping, recommend ANV02 or ANV03 instead.   |
| Please note that no returns will be accepted if safe opened using explosives.                                                                            |
| Customer complaint: rabbit has been able to detect trap, food apparently less effective now.                                                             |
| Customer complaint:
Circular hole in safe floor can apparently be easily cut with handsaw.                                                                |
| Matches not included, recommend purchase of matches or detonator (item DTNTR).                                                                           |
+----------------------------------------------------------------------------------------------------------------------------------------------------------+
7 rows in set (0.03 sec)
# 查询扩展会搜索所有记录(相关性查询)
#查询database,同时也会把含有mysql但不含database的记录也找到。
#此功能会带来诸多非相关性的查询(谨慎使用)

3.布尔文本搜索 in boolean mode

布尔操作符

说明

+

包含,词必须存在

-

排除,词必须不出现

>

包含,而且增加等级值

<

包含,且减少等级值

()

把词组成子表达式(允许这些表达式作为一个组被包含、排除、排列等)

~

取消一个词的排序值

*

词尾的通配符

“ ”

定义一个短语(与单个词的列表不一样,它匹配整个短语一边包含或排除这个短语)

–关键字 in boolean mode

#全文本搜索包含此heavy的所有行

mysql> select note_text from productnotes where match(note_text) against('heavy' in boolean mode);
#全文本搜索heavy ,排除rope(以rope开头的词)的行
mysql> select note_text from productnotes where match(note_text) against('heavy -rope*' in boolean mode);
#匹配包含词rabbit和bait的行
mysql> select note_text from productnotes where match(note_text) against('+rabbit +bait' in boolean mode);
#搜索匹配短语rabbit bait而不是匹配两个词rabbit和bait
mysql> select note_text from productnotes where match(note_text) against('"rabbit bait"' in boolean mode);

#匹配rabbit和carrot,增加前者的等级,降低后者的等级
mysql> select note_text from productnotes where match(note_text) against('>rabbit <carrot' in boolean mode);

#必须匹配词safe和combination,降低后者的等级
mysql> select note_text from productnotes where match(note_text) against('>rabbit +(<combination)' in boolean mode);

#排列而不排序:在布尔方式中,不按等级值降序排序返回的行

演示排序

rank在mysql 8.0版本中数据保留字,使用时必须用’'括起来或者更改名字;

#普通全文本搜索
mysql> select note_text from productnotes where match(note_text) against('rabbit');
+---------------------------------------------------------------------------------------------------------------------+
| note_text                                                                                                           |
+---------------------------------------------------------------------------------------------------------------------+
| Customer complaint: rabbit has been able to detect trap, food apparently less effective now.                        |
| Quantity varies, sold by the sack load.
All guaranteed to be bright and orange, and suitable for use as rabbit bait. |
+---------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.02 sec)
#演示排序工作
mysql> select note_text,match(note_text) against('rabbit') as 'rank' from productnotes;
+----------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+
| note_text                                                                                                                                                | rank               |
+----------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+
| Customer complaint:
Sticks not individually wrapped, too easy to mistakenly detonate all at once.
Recommend individual wrapping.                           |                  0 |
| Can shipped full, refills not available.
Need to order new can if refill needed.                                                                          |                  0 |
| Safe is combination locked, combination not provided with safe.
This is rarely a problem as safes are typically blown up or dropped by customers.         |                  0 |
| Quantity varies, sold by the sack load.
All guaranteed to be bright and orange, and suitable for use as rabbit bait.                                      | 1.5905543565750122 |
| Included fuses are short and have been known to detonate too quickly for some customers.
Longer fuses are available (item FU1) and should be recommended. |                  0 |
| Matches not included, recommend purchase of matches or detonator (item DTNTR).                                                                           |                  0 |
| Please note that no returns will be accepted if safe opened using explosives.                                                                            |                  0 |
| Multiple customer returns, anvils failing to drop fast enough or falling backwards on purchaser. Recommend that customer considers using heavier anvils. |                  0 |
| Item is extremely heavy. Designed for dropping, not recommended for use with slings, ropes, pulleys, or tightropes.                                      |                  0 |
| Customer complaint: rabbit has been able to detect trap, food apparently less effective now.                                                             | 1.6408053636550903 |
| Shipped unassembled, requires common tools (including oversized hammer).                                                                                 |                  0 |
| Customer complaint:
Circular hole in safe floor can apparently be easily cut with handsaw.                                                                |                  0 |
| Customer complaint:
Not heavy enough to generate flying stars around head of victim. If being purchased for dropping, recommend ANV02 or ANV03 instead.   |                  0 |
| Call from individual trapped in safe plummeting to the ground, suggests an escape hatch be added.
Comment forwarded to vendor.                            |                  0 |
+----------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+
14 rows in set (0.03 sec)
#rank中大于1的就是全文搜索匹配的行

注意事项

  • 在索引全文本数据时,短词被忽略且从索引中排除,短词定义为3个即以下字符的词
  • MySQL有一个内建的非用词列表,这些词在索引全文本数据时总是被忽略,有需要可以覆盖
  • 如果一个词出现在50%以上的行中,则将它作为一个非用词忽略,这个规则不适用于IN BOOLEAN MODE
  • 表中行数<=2行,全文本搜索不返回结果,因为50%的规则
  • 忽略单引号 如don’t索引为dont