es根据字符串排序 es 字符串转数字排序

转载

mob6454cc72ae38 2024-04-02 22:45:38

文章标签 es根据字符串排序 elasticsearch 字段字符串排序 文章分类 架构后端开发

本篇整理内容为排序：（1）相关度分数的计算 - v42；（2）字符串排序问题 - v41；（3）DocValues - v44。

1. 相关度分数的计算

2. 字符串排序问题

3. DocValues

1. 相关度分数的计算

es使用的是TF/IDF算法（Term Frequency&Inverse Document Frequency）

（1）Term Frequency：查询词条/文本在document中出现频率，频率越高，相关度越高。

（2）Inverse Document Frequency：查询词条/文本在document所在索引中出现的频率，出现频率越高，相关度越低。

（3）Field-length（字段长度归约）norm：field越长，相关度越低。

在查询语句后，将explain设置为true，可以获取相关度分数等计算的详情。

GET /mauanx/user/_search?explain=true // 在查询语句后，将explain设置为true
{
    "query":{
        "match":{
            "name":"Abbey"
        }
    }
}

查看一个文档是否能匹配上某个查询，若存在则可查看具体的相关度计算，否则提示不匹配：

GET /mauanx/user/1?explain
{
    "query":{
        "match":{
            "name":"Abbey"
        }
    }
}

2. 字符串排序问题

es可以对数值进行排序，在elasticsearch6实战教程学习笔记（三）中，可使用1.（4）其他查询中的”sort”进行排序。

举一个栗子：

GET /mauanx/user/_search
{
    "query":{
        "match_all":{}
    },
    "sort":[{
        "age":{
            "order":"desc"
        }
    }]
}

但面对text类型时，直接使用sort会报错。

为了能对text进行排序，可以修改mapping中字段的配置：

PUT /mauanx // 创建索引mauanx、type为user
{
  "settings":{
    "number_of_shards":5, // 5个分片
    "number_of_replicas":1 // 1个副本
  },
  "mappings":{
    "user":{
      "properties":{
        "name": {
          "type": "text"
        },
        "age": {
           "type": "integer"
        },
        "sex":{
           "type": "text"
        },
        "interests":{
           "type": "text",
           "fields": {
             "raw":{
               "type": "keyword"
             }
           }
        }
      }
    }
  }
}

查询的时候使用：

GET /mauanx/user/_search
{
    "query":{
        "match_all":{}
    },
    "sort":[{
        "interests.raw":{ // 若此处为"interests"，则会根据第一个分词进行排序，而不是整个语句。
            "order":"desc"
        }
    }]
}

3. DocValues

Lucene在构建倒排索引时，会额外建立一个有序的正排索引（基于document à field value 的映射列表，也就是DocValues）。

DocValues存储在磁盘上，节省了内存空间。默认情况下，DocValues对数值型、日期型等（非字符型）开启，不对text文本/字符串型开启（因为需要分词）。开启时需要把fielddata设置为true。DocValues可以大大提高排序、分组和一些聚合操作的性能。

需要关闭字段的DocValues时，需要在mapping中设置字段的"doc_values"为false，在创建mapping时，可以修改：

PUT /mauanx2 // 创建索引mauanx2、type为user
{
  "settings":{
    "number_of_shards":5, // 5个分片
    "number_of_replicas":1 // 1个副本
  },
  "mappings":{
    "user":{
      "properties":{
        "name": {
          "type": "text"
        },
        "age": {
           "type": "integer",
           "doc_values": false
        },
        "sex":{
           "type": "text"
        },
        "interests":{
           "type": "text"
        }
      }
    }
  }
}

我尝试直接修改mapping，系统报错，大意是：修改与已有的doc_values冲突。所以推荐在创建mapping时就修改好doc_values的设置。

es根据字符串排序 es 字符串转数字排序_字段