es搜索添加时间范围条件不生效

转载

mob64ca1405d568 2024-09-14 13:13:52

文章标签 es搜索添加时间范围条件不生效 elasticsearch analyzer 倒排索引搜索 文章分类 架构后端开发

es搜索核心与实战 Day02

一、倒排索引

1.搜索引擎

正排索引——文档ld到文档内容和单词的关联+
倒排索引——单词到文档Id的关系

2。倒排索引的核心组成

倒排索引包含两个部分

单词词典 (Term Dictionary)，记录所有文档的单词，记录单词到倒排列表的关联关系

单词词典一般比较大，可以通过B +树或哈希拉链法实现，以满足高性能的插入与查询

倒排列表(Posting List) - 记录了单词对应的文档结合，由倒排索引项组成
倒排索引项(Posting)
1.文档ID
2.词频TF-该单词在文档中出现的次数，用于相关性评分
3.位置(Position) -单词在文档中分词的位置。用于语句搜索(phrase query)
4.偏移(Offset) -记录单词的开始结束位置，实现高亮显示

二、通过Analyzer进行分词

GET _analyze
{
   //
  "analyzer": standard
  "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}

1.standard analyzer

默认分词器
按词切分
小写处理
返回结果

{
  "tokens" : [
    {
      "token" : "2",//返回结果值
      "start_offset" : 0,//结果值开始位置
      "end_offset" : 1,//结果值结束位置
      "type" : "<NUM>",//结果值类型
      "position" : 0//第几个
    },
    {
      "token" : "running",
      "start_offset" : 2,
      "end_offset" : 9,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "quick",
      "start_offset" : 10,
      "end_offset" : 15,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "brown",
      "start_offset" : 16,
      "end_offset" : 21,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "foxes",
      "start_offset" : 22,
      "end_offset" : 27,
      "type" : "<ALPHANUM>",
      "position" : 4
    },
    {
      "token" : "leap",
      "start_offset" : 28,
      "end_offset" : 32,
      "type" : "<ALPHANUM>",
      "position" : 5
    },
    {
      "token" : "over",
      "start_offset" : 33,
      "end_offset" : 37,
      "type" : "<ALPHANUM>",
      "position" : 6
    },
    {
      "token" : "lazy",
      "start_offset" : 38,
      "end_offset" : 42,
      "type" : "<ALPHANUM>",
      "position" : 7
    },
    {
      "token" : "dogs",
      "start_offset" : 43,
      "end_offset" : 47,
      "type" : "<ALPHANUM>",
      "position" : 8
    },
    {
      "token" : "in",
      "start_offset" : 48,
      "end_offset" : 50,
      "type" : "<ALPHANUM>",
      "position" : 9
    },
    {
      "token" : "the",
      "start_offset" : 51,
      "end_offset" : 54,
      "type" : "<ALPHANUM>",
      "position" : 10
    },
    {
      "token" : "summer",
      "start_offset" : 55,
      "end_offset" : 61,
      "type" : "<ALPHANUM>",
      "position" : 11
    },
    {
      "token" : "evening",
      "start_offset" : 62,
      "end_offset" : 69,
      "type" : "<ALPHANUM>",
      "position" : 12
    }
  ]
}

2.simple analyzer

按照非字母切分
非字母的都被去除
小写处理
返回结果

{
  "tokens" : [
    {
      "token" : "running",
      "start_offset" : 2,
      "end_offset" : 9,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "quick",
      "start_offset" : 10,
      "end_offset" : 15,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "brown",
      "start_offset" : 16,
      "end_offset" : 21,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "foxes",
      "start_offset" : 22,
      "end_offset" : 27,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "leap",
      "start_offset" : 28,
      "end_offset" : 32,
      "type" : "word",
      "position" : 4
    },
    {
      "token" : "over",
      "start_offset" : 33,
      "end_offset" : 37,
      "type" : "word",
      "position" : 5
    },
    {
      "token" : "lazy",
      "start_offset" : 38,
      "end_offset" : 42,
      "type" : "word",
      "position" : 6
    },
    {
      "token" : "dogs",
      "start_offset" : 43,
      "end_offset" : 47,
      "type" : "word",
      "position" : 7
    },
    {
      "token" : "in",
      "start_offset" : 48,
      "end_offset" : 50,
      "type" : "word",
      "position" : 8
    },
    {
      "token" : "the",
      "start_offset" : 51,
      "end_offset" : 54,
      "type" : "word",
      "position" : 9
    },
    {
      "token" : "summer",
      "start_offset" : 55,
      "end_offset" : 61,
      "type" : "word",
      "position" : 10
    },
    {
      "token" : "evening",
      "start_offset" : 62,
      "end_offset" : 69,
      "type" : "word",
      "position" : 11
    }
  ]
}

3.whitespace analyzer

按空格切分
返回结果

{
  "tokens" : [
    {
      "token" : "2",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "running",
      "start_offset" : 2,
      "end_offset" : 9,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "Quick",
      "start_offset" : 10,
      "end_offset" : 15,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "brown-foxes",
      "start_offset" : 16,
      "end_offset" : 27,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "leap",
      "start_offset" : 28,
      "end_offset" : 32,
      "type" : "word",
      "position" : 4
    },
    {
      "token" : "over",
      "start_offset" : 33,
      "end_offset" : 37,
      "type" : "word",
      "position" : 5
    },
    {
      "token" : "lazy",
      "start_offset" : 38,
      "end_offset" : 42,
      "type" : "word",
      "position" : 6
    },
    {
      "token" : "dogs",
      "start_offset" : 43,
      "end_offset" : 47,
      "type" : "word",
      "position" : 7
    },
    {
      "token" : "in",
      "start_offset" : 48,
      "end_offset" : 50,
      "type" : "word",
      "position" : 8
    },
    {
      "token" : "the",
      "start_offset" : 51,
      "end_offset" : 54,
      "type" : "word",
      "position" : 9
    },
    {
      "token" : "summer",
      "start_offset" : 55,
      "end_offset" : 61,
      "type" : "word",
      "position" : 10
    },
    {
      "token" : "evening.",
      "start_offset" : 62,
      "end_offset" : 70,
      "type" : "word",
      "position" : 11
    }
  ]
}

4.stop analyzer

相比Simple Analyzer
多了stop filter
会把the，a，is等修饰性词语去除
返回结果

{
  "tokens" : [
    {
      "token" : "running",
      "start_offset" : 2,
      "end_offset" : 9,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "quick",
      "start_offset" : 10,
      "end_offset" : 15,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "brown",
      "start_offset" : 16,
      "end_offset" : 21,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "foxes",
      "start_offset" : 22,
      "end_offset" : 27,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "leap",
      "start_offset" : 28,
      "end_offset" : 32,
      "type" : "word",
      "position" : 4
    },
    {
      "token" : "over",
      "start_offset" : 33,
      "end_offset" : 37,
      "type" : "word",
      "position" : 5
    },
    {
      "token" : "lazy",
      "start_offset" : 38,
      "end_offset" : 42,
      "type" : "word",
      "position" : 6
    },
    {
      "token" : "dogs",
      "start_offset" : 43,
      "end_offset" : 47,
      "type" : "word",
      "position" : 7
    },
    {
      "token" : "summer",
      "start_offset" : 55,
      "end_offset" : 61,
      "type" : "word",
      "position" : 10
    },
    {
      "token" : "evening",
      "start_offset" : 62,
      "end_offset" : 69,
      "type" : "word",
      "position" : 11
    }
  ]
}

5.keyword analyzer
不分词，直接将输入当成一个term输出

6.pattern analyzer

通过正则表达式进行分词
默认是\W+,非字符的符号进行分隔

7.english analyzer

三、SearchAPI及URISearch详解

1.URI Search

在URI中使用查询参数

2.Request Body Search

使用Elasticsearch提供的，基于JSON格式的更加完备的Query Domain Specific Language （DSL）

3.搜索的相关性Relevance

搜索是用户和搜索引擎的对话
用户关心的是搜索结果的相关性

是否可以找到所有相关的内容
有多少不相关的内容被返回了
文档的打分是否合理
结合业务需求，平衡结果排名

Page Rank算法

不仅仅是内容
更重要的是内容的可信度

4.衡量相关性

Information Retrieval

Precision (查准率) -尽可能返回较少的无关文档
Recall (查全率) -尽量返回较多的相关文档
Ranking -是否能够按照相关度进行排序?

5.URISearch

a.指定字段

查询出指定字段(title)值为2012的数据

GET /movies/_search?q=2012&df=title
{
  "profile": "true"
}

b.泛查询

查询出任意字段值为2012的数据

GET /movies/_search?q=2012
{
	"profile": "true"
}

c.Term and Phrase

Beautiful Mind等效于Beautiful OR Mind
“Beautiful Mind”，等效于Beautiful AND Mind。Phrase 查询，还要求前后顺序保持一致

//使用引号，Phrase查询
GET /movies/_search?q=title:"Beautiful Mind"
{
   "profile": "true"
}

d.分组查询

//分组，Bool查询
GET /movies/_search?q=title:(Beautiful Mind)
{
   "profile": "true"
}

必须包含Beautiful和Mind

//查找美丽心灵
GET /movies/_search?q=title:(Beautiful AND Mind)
{
   "profile": "true"
}

//查找美丽心灵
GET /movies/_search?q=title:(Beautiful %2BMind)
{
   "profile": "true"
}

必须包含Beautiful不包含Mind

//查找美丽心灵
GET /movies/_search?q=title:(Beautiful NOT Mind)
{
   "profile": "true"
}

e.范围查询

年份大于1980

//范围查询，区间写法/数学写法
GET /movies/_search?q=year:>=1980
{
   "profile": "true"
}

f.通配符查询

?代表1个字符，*代表0个或多个字符

title:mi?d

title:be*

四、Requestbody与QueryDSL以及QueryString&SimpleQueryString查询

1.Request Body Search

将查询语句通过HTTP Requedt Body发送给Elasticsearch
Query DSL

2.查询表达式——Match

POST /movies/_search
{
  "query": {
    "match": {
      "title": "Last Christmas"
    }
  }
}

POST /movies/_search
{
  "query": {
    "match": {
      "title":{
        "query": "Last Christmas",
        "operator": "and"
      }
    }
  }
}

3.短语搜索------Match Phrase

POST /movies/_search
{
  "query": {
    "match_phrase": {
      "title":{
     // 字符按照下列顺序出现
        "query": "one love",
     //中间可以出现其他字符
        "slop": 1
      }
    }
  }
}

4.Simple Query String Query

类似 Query String，但是会忽略错误的语法,同时只支持部分查询语法
不支持AND OR NOT，会当作字符串处理.
Term 之间默认的关系是OR,可以指定Operator
支持部分逻辑

＋替代AND

| 替代OR

－替代NOT

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：java 短信短链接跳转

下一篇：ansible的inventary文件

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯

es搜索添加时间范围条件不生效

es搜索添加时间范围条件不生效

es搜索核心与实战 Day02

一、倒排索引

1.搜索引擎

2。倒排索引的核心组成

倒排索引包含两个部分

二、通过Analyzer进行分词

三、SearchAPI及URISearch详解

1.URI Search

2.Request Body Search

3.搜索的相关性Relevance

4.衡量相关性

5.URISearch

a.指定字段

b.泛查询

c.Term and Phrase

d.分组查询

e.范围查询

f.通配符查询

四、Requestbody与QueryDSL以及QueryString&SimpleQueryString查询

1.Request Body Search

2.查询表达式——Match

3.短语搜索------Match Phrase

4.Simple Query String Query

51CTO博客