es字段存数组类型 es存储json文本数据

转载

mob6454cc67e023 2024-06-05 06:24:30

文章标签 es字段存数组类型 es 索引文档搜索 文章分类 架构后端开发

面向文档

Elasticsearch 是面向文档型数据库，这意味着它存储的是整个对象或者文档，它不但会存储它们，还会为他们建立索引，这样你就可以搜索他们了。你可以在 Elasticsearch 中索引、搜索、排序和过滤这些文档。不需要成行成列的数据。这将会是完全不同的一种面对数据的思考方式，这也是为什么 Elasticsearch 可以执行复杂的全文搜索的原因。

json

Elasticsearch使用 JSON (或称作JavaScript Object Notation ) 作为文档序列化的格式。JSON 已经被大多数语言支持，也成为 NoSQL 领域的一个标准格式。它简单、简洁、易于阅读。

索引

在 Elasticsearch 中，存储数据的行为就叫做索引(indexing)

关系数据库     ⇒ 数据库 ⇒ 表    ⇒ 行    ⇒ 列(Columns)
Elasticsearch  ⇒ 索引   ⇒ 类型  ⇒ 文档  ⇒ 字段(Fields)

demo

添加文档

curl -XPUT 'HTTP://localhost:9200/megacorp/employee/1' -d'{
    "first_name" : "John",
    "last_name" :  "Smith",
    "age" :        25,
    "about" :      "I love to go rock climbing",
    "interests": [ "sports", "music" ]
}'

查询文档-根据文档

curl -XGET 'HTTP://localhost:9200/megacorp/employee/1/?pretty'

结果：

{
  "_index" : "megacorp",
  "_type" : "employee",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "first_name" : "John",
    "last_name" : "Smith",
    "age" : 25,
    "about" : "I love to go rock climbing",
    "interests" : [
      "sports",
      "music"
    ]
  }
}

查询文档-全局
(用_search函数)

curl -XGET 'HTTP://localhost:9200/megacorp/employee/_search/?pretty'

查询字符串搜索

根据字段搜索

curl -XGET 'HTTP://localhost:9200/megacorp/employee/_search?q=last_name:Smith'

dsl（通过json查询）

curl -XGET 'HTTP://localhost:9200/megacorp/employee/_search' -d'{
    "query" : {
        "match" : {
            "last_name" : "Smith"
        }
    }
}'

多条件查询

查询语句：

curl -XGET 'HTTP://localhost:9200/megacorp/employee/_search' -d'{
    "query" : {
        "filtered" : {
            "filter" : {
                "range" : {
                    "age" : { "gt" : 30 } 
                }
            },
            "query" : {
                "match" : {
                    "last_name" : "Smith" 
                }
            }
        }
    }
}'

报错：

no [query] registered for [filtered]

原因：
其中的filtered已经弃用，应该使用如下查询方法：

curl -XGET 'HTTP://localhost:9200/megacorp/employee/_search' -d'{
    "query" : {
        "bool" : {
            "filter" : {
                "range" : {
                    "age" : { "gt" : 30 } 
                }
            },
            "must" : {
                "match" : {
                    "last_name" : "Smith" 
                }
            }
        }
    }
}'

全文搜索

about 模糊查询

curl -XGET 'HTTP://localhost:9200/megacorp/employee/_search' -d'{
    "query" : {
        "match" : {
            "about" : "rock climbing"
        }
    }
}'

得到结果按照==相关性==由强到弱排序，es的重点，而这也是它与传统数据库在返回匹配数据时最大的不同之处

about 精确查询

用match_phrase

curl -XGET 'HTTP://localhost:9200/megacorp/employee/_search' -d'{
    "query" : {
        "match_phrase" : {
            "about" : "rock climbing"
        }
    }
}'

高亮字段

curl -XGET 'HTTP://localhost:9200/megacorp/employee/_search' -d'{
    "query" : {
        "match_phrase" : {
            "about" : "rock climbing"
        }
    },
    "highlight": {
        "fields" : {
            "about" : {}
        }
    }
}'

查询结果中的about字段会被加上标签

统计功能

如下：找一下员工中最受欢迎的兴趣是什么：

curl -XGET 'HTTP://localhost:9200/megacorp/employee/_search' -d'{
   "aggs": {
    "all_interests": {
      "terms": { "field": "interests" }
    }
  }
}'

报错：

{
    "error":{
        "root_cause":[
            {
                "type":"illegal_argument_exception",
                "reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [interests] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
            }
        ],
        "type":"search_phase_execution_exception",
        "reason":"all shards failed",
        "phase":"query",
        "grouped":true,
        "failed_shards":[
            {
                "shard":0,
                "index":"megacorp",
                "node":"NozbGBsLTRmuOuP8fHwYmA",
                "reason":{
                    "type":"illegal_argument_exception",
                    "reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [interests] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
                }
            }
        ]
    },
    "status":400
}

原因：
Fielddata can consume a lot of heap space, especially when loading high cardinality text fields. Once fielddata has been loaded into the heap, it remains there for the lifetime of the segment.
（fielddata会消耗大量的栈内存，尤其在进行加载文本的时候，所以一单fielddata完成了加载，就会一直存在。）
Also, loading fielddata is an expensive process which can cause users to experience latency hits. This is why fielddata is disabled by default.
（同时，加载fielddata的过程中可能造成延迟命中，所以fielddata默认是关闭的。）
解决：

curl -XGET 'HTTP://localhost:9200/megacorp/employee/_search' -d'{
   "aggs": {
    "all_interests": {
      "terms": {
        "field": "interests.keyword"
      }
    }
  }
}'

结果（只贴了一部分）：

"aggregations":{
        "all_interests":{
            "doc_count_error_upper_bound":0,
            "sum_other_doc_count":0,
            "buckets":[
                {
                    "key":"music",
                    "doc_count":2
                },
                {
                    "key":"forestry",
                    "doc_count":1
                },
                {
                    "key":"sports",
                    "doc_count":1
                }
            ]
        }
    }

查询姓 Smith 的员工的兴趣汇总情况

curl -XGET 'HTTP://localhost:9200/megacorp/employee/_search' -d'{
"query": {
    "match": {
      "last_name": "smith"
    }
  },
   "aggs": {
    "all_interests": {
      "terms": {
        "field": "interests.keyword"
      }
    }
  }
}'

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。