面向文档

Elasticsearch 是 面向文档型数据库,这意味着它存储的是整个对象或者 文档,它不但会存储它们,还会为他们建立索引,这样你就可以搜索他们了。你可以在 Elasticsearch 中索引、搜索、排序和过滤这些文档。不需要成行成列的数据。这将会是完全不同的一种面对数据的思考方式,这也是为什么 Elasticsearch 可以执行复杂的全文搜索的原因。

json

Elasticsearch使用 JSON (或称作JavaScript Object Notation ) 作为文档序列化的格式。JSON 已经被大多数语言支持,也成为 NoSQL 领域的一个标准格式。它简单、简洁、易于阅读。

索引

在 Elasticsearch 中,存储数据的行为就叫做 索引(indexing)

关系数据库     ⇒ 数据库 ⇒ 表    ⇒ 行    ⇒ 列(Columns)
Elasticsearch  ⇒ 索引   ⇒ 类型  ⇒ 文档  ⇒ 字段(Fields)

demo

  • 添加文档
curl -XPUT 'HTTP://localhost:9200/megacorp/employee/1' -d'{
    "first_name" : "John",
    "last_name" :  "Smith",
    "age" :        25,
    "about" :      "I love to go rock climbing",
    "interests": [ "sports", "music" ]
}'
  • 查询文档-根据文档
curl -XGET 'HTTP://localhost:9200/megacorp/employee/1/?pretty'

结果:

{
  "_index" : "megacorp",
  "_type" : "employee",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "first_name" : "John",
    "last_name" : "Smith",
    "age" : 25,
    "about" : "I love to go rock climbing",
    "interests" : [
      "sports",
      "music"
    ]
  }
}
  • 查询文档-全局
    (用_search函数)
curl -XGET 'HTTP://localhost:9200/megacorp/employee/_search/?pretty'
  • 查询字符串搜索
  • 根据字段搜索
curl -XGET 'HTTP://localhost:9200/megacorp/employee/_search?q=last_name:Smith'
  • dsl(通过json查询)
curl -XGET 'HTTP://localhost:9200/megacorp/employee/_search' -d'{
    "query" : {
        "match" : {
            "last_name" : "Smith"
        }
    }
}'
  • 多条件查询
  • 查询语句:
curl -XGET 'HTTP://localhost:9200/megacorp/employee/_search' -d'{
    "query" : {
        "filtered" : {
            "filter" : {
                "range" : {
                    "age" : { "gt" : 30 } 
                }
            },
            "query" : {
                "match" : {
                    "last_name" : "Smith" 
                }
            }
        }
    }
}'
  • 报错:
no [query] registered for [filtered]
  • 原因:
    其中的filtered已经弃用,应该使用如下查询方法:
curl -XGET 'HTTP://localhost:9200/megacorp/employee/_search' -d'{
    "query" : {
        "bool" : {
            "filter" : {
                "range" : {
                    "age" : { "gt" : 30 } 
                }
            },
            "must" : {
                "match" : {
                    "last_name" : "Smith" 
                }
            }
        }
    }
}'

全文搜索

about 模糊查询

curl -XGET 'HTTP://localhost:9200/megacorp/employee/_search' -d'{
    "query" : {
        "match" : {
            "about" : "rock climbing"
        }
    }
}'

得到结果按照==相关性==由强到弱排序,es的重点,而这也是它与传统数据库在返回匹配数据时最大的不同之处

about 精确查询

用match_phrase

curl -XGET 'HTTP://localhost:9200/megacorp/employee/_search' -d'{
    "query" : {
        "match_phrase" : {
            "about" : "rock climbing"
        }
    }
}'

高亮字段

curl -XGET 'HTTP://localhost:9200/megacorp/employee/_search' -d'{
    "query" : {
        "match_phrase" : {
            "about" : "rock climbing"
        }
    },
    "highlight": {
        "fields" : {
            "about" : {}
        }
    }
}'

查询结果中的about字段会被加上标签

统计功能

如下:找一下员工中最受欢迎的兴趣是什么:

curl -XGET 'HTTP://localhost:9200/megacorp/employee/_search' -d'{
   "aggs": {
    "all_interests": {
      "terms": { "field": "interests" }
    }
  }
}'
  • 报错:
{
    "error":{
        "root_cause":[
            {
                "type":"illegal_argument_exception",
                "reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [interests] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
            }
        ],
        "type":"search_phase_execution_exception",
        "reason":"all shards failed",
        "phase":"query",
        "grouped":true,
        "failed_shards":[
            {
                "shard":0,
                "index":"megacorp",
                "node":"NozbGBsLTRmuOuP8fHwYmA",
                "reason":{
                    "type":"illegal_argument_exception",
                    "reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [interests] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
                }
            }
        ]
    },
    "status":400
}
  • 原因:
    Fielddata can consume a lot of heap space, especially when loading high cardinality text fields. Once fielddata has been loaded into the heap, it remains there for the lifetime of the segment.
    (fielddata会消耗大量的栈内存,尤其在进行加载文本的时候,所以一单fielddata完成了加载,就会一直存在。)
    Also, loading fielddata is an expensive process which can cause users to experience latency hits. This is why fielddata is disabled by default.
    (同时,加载fielddata的过程中可能造成延迟命中,所以fielddata默认是关闭的。)
  • 解决:
curl -XGET 'HTTP://localhost:9200/megacorp/employee/_search' -d'{
   "aggs": {
    "all_interests": {
      "terms": {
        "field": "interests.keyword"
      }
    }
  }
}'
  • 结果(只贴了一部分):
"aggregations":{
        "all_interests":{
            "doc_count_error_upper_bound":0,
            "sum_other_doc_count":0,
            "buckets":[
                {
                    "key":"music",
                    "doc_count":2
                },
                {
                    "key":"forestry",
                    "doc_count":1
                },
                {
                    "key":"sports",
                    "doc_count":1
                }
            ]
        }
    }

查询姓 Smith 的员工的兴趣汇总情况

curl -XGET 'HTTP://localhost:9200/megacorp/employee/_search' -d'{
"query": {
    "match": {
      "last_name": "smith"
    }
  },
   "aggs": {
    "all_interests": {
      "terms": {
        "field": "interests.keyword"
      }
    }
  }
}'