Elasticsearch 入门教程

本文根据 指南,基于​​docker​​ 容器快速搭建 ​​Elasticsearch​​ 环境,并结   ​​Elasticsearch​​ 快速入门进行总结。 

安装

官网安装教程地址:

​https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started.html​

基本概念

1. Node 与 Cluster

​Elastic​​ 本质上是一个分布式数据库,允许多台服务器协同工作,每台服务器可以运行多个 ​​Elastic​​ 实例,单个 ​​Elastic​​ 实例称为一个节点(​​node​​),一组节点构成一个集群(​​cluster​​)。

2. Index

​Elastic​​ 会索引所有字段,经过处理后写入一个反向索引(​​Inverted Index​​)。查找数据的时候,直接查找该索引。

所以,​​Elastic​​ 数据管理的顶层单位就叫做 ​​Index​​(索引)。它是单个数据库的同义词。每个 ​​Index​​ (即数据库)的名字必须是小写。

下面的命令可以查看当前节点的所有 ​​Index​​。

$ curl -X GET 'http://localhost:9200/_cat/indices?v'

3. 添加单个数据

POST logs-my_app-default/_doc
{
"@timestamp": "2099-05-06T16:21:15.000Z",
"event": {
"original": "192.0.2.42 - - [06/May/2099:16:21:15 +0000] \"GET /images/bg.jpg HTTP/1.0\" 200 24736"
}
}

结果:

{
"_index": ".ds-logs-my_app-default-2099-05-06-000001",
"_type": "_doc",
"_id": "gl5MJXMBMk1dGnErnBW8",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1
}

4. 添加多个数据

PUT logs-my_app-default/_bulk
{ "create": { } }
{ "@timestamp": "2099-05-07T16:24:32.000Z", "event": { "original": "192.0.2.242 - - [07/May/2020:16:24:32 -0500] \"GET /images/hm_nbg.jpg HTTP/1.0\" 304 0" } }
{ "create": { } }
{ "@timestamp": "2099-05-08T16:25:42.000Z", "event": { "original": "192.0.2.255 - - [08/May/2099:16:25:42 +0000] \"GET /favicon.ico HTTP/1.0\" 200 3638" } }

5. 搜索数据

查询所有匹配数据:​​logs-my_app-default​​,并以​​@timestamp​​ 降序显示

GET logs-my_app-default/_search
{
"query": {
"match_all": { }
},
"sort": [
{
"@timestamp": "desc"
}
]
}

结果:

{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": ".ds-logs-my_app-default-2099-05-06-000001",
"_type": "_doc",
"_id": "PdjWongB9KPnaVm2IyaL",
"_score": null,
"_source": {
"@timestamp": "2099-05-08T16:25:42.000Z",
"event": {
"original": "192.0.2.255 - - [08/May/2099:16:25:42 +0000] \"GET /favicon.ico HTTP/1.0\" 200 3638"
}
},
"sort": [
4081940742000
]
},
...
]
}
}

6. 解析固定字段,去除一些字段:

GET logs-my_app-default/_search
{
"query": {
"match_all": { }
},
"fields": [
"@timestamp"
],
"_source": false,
"sort": [
{
"@timestamp": "desc"
}
]
}

结果:

{
...
"hits": {
...
"hits": [
{
"_index": ".ds-logs-my_app-default-2099-05-06-000001",
"_type": "_doc",
"_id": "PdjWongB9KPnaVm2IyaL",
"_score": null,
"fields": {
"@timestamp": [
"2099-05-08T16:25:42.000Z"
]
},
"sort": [
4081940742000
]
},
...
]
}
}

​"fields"​​挑选字段解析,​​'_source':false,​​该字段不再显示

7. 范围搜索 ​​range​

GET logs-my_app-default/_search
{
"query": {
"range": {
"@timestamp": {
"gte": "2099-05-05",
"lt": "2099-05-08"
}
}
},
"fields": [
"@timestamp"
],
"_source": false,
"sort": [
{
"@timestamp": "desc"
}
]
}

查询过去一天的数据

GET logs-my_app-default/_search
{
"query": {
"range": {
"@timestamp": {
"gte": "now-1d/d",
"lt": "now/d"
}
}
},
"fields": [
"@timestamp"
],
"_source": false,
"sort": [
{
"@timestamp": "desc"
}
]
}

8. 新建 索引 ​​Index​

PUT my_index
{
"mappings":
{
"properties":
{
"address":
{
"type": "ip"
},
"port":
{
"type": "long"
}
}
}
}

结果:

{
"acknowledged" : true,
"shards_acknowledged" : true,
"index" : "my_index"
}

9. 将一些文档加载到其中:

POST my_index/_bulk
{"index":{"_id":"1"}}
{"address":"1.2.3.4","port":"80"}
{"index":{"_id":"2"}}
{"address":"1.2.3.4","port":"8080"}
{"index":{"_id":"3"}}
{"address":"2.4.8.16","port":"80"}

返回结果:

{
"took" : 8,
"errors" : false,
"items" : [
{
"index" : {
"_index" : "my_index",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1,
"status" : 201
}
},
{
"index" : {
"_index" : "my_index",
"_type" : "_doc",
"_id" : "2",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 1,
"_primary_term" : 1,
"status" : 201
}
},
{
"index" : {
"_index" : "my_index",
"_type" : "_doc",
"_id" : "3",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 2,
"_primary_term" : 1,
"status" : 201
}
}
]
}

10. 使用静态字符串创建两个

GET my_index/_search
{
"runtime_mappings": {
"socket": {
"type": "keyword",
"script": {
"source": "emit(doc['address'].value + ':' + doc['port'].value)"
}
}
},
"fields": [
"socket"
],
"query": {
"match": {
"socket": "1.2.3.4:8080"
}
}
}

返回结果:

{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"address" : "1.2.3.4",
"port" : "8080"
},
"fields" : {
"socket" : [
"1.2.3.4:8080"
]
}
}
]
}
}

上面代码中,返回结果的 ​​took​​字段表示该操作的耗时(单位为毫秒),​​timed_out​​字段表示是否超时,​​hits​​字段表示命中的记录,里面子字段的含义如下:

  • ​total​​:返回记录数,本例是2条。
  • ​max_score​​:最高的匹配程度,本例是1.0。
  • ​hits​​:返回的记录组成的数组。

返回的数据中,​​found​​字段表示查询成功,​​_source​​字段返回原始记录。

我们在 ​​runtime_mappings​​ 部分中定义了字段 ​​socket​​。 我们使用了一个简短的 ​​painless script​​,该脚本定义了每个文档将如何计算 ​​socket​​ 的值(使用 + 表示 ​​address​​ 字段的值与静态字符串 “:” 和 ​​port​​ 字段的值的串联)。 然后,我们在查询中使用了字段 ​​socket​​。 字段 ​​socket​​ 是一个临时运行时字段,仅对于该查询存在,并且在运行查询时进行计算。 在定义要与 ​​runtime fields​​ 一起使用的 ​​painless script​​ 时,必须包括 ​​emit​​ 以返回计算出的值。

​socket​​ :运行时加入的字段。​​source​​, ​​id​

官方文档:​​The script itself, which you specify as source for an inline script or id for a stored script. Use the stored script APIs to create and manage stored scripts.​

​"source": "emit(doc['address'].value + ':' + doc['port'].value)" 为内嵌脚本​

11. 如果我们发现 ​​socket​​ 是一个我们想在多个查询中使用的字段,而不必为每个查询定义它,则可以通过调用简单地将其添加到映射中:

PUT my_index/_mapping
{
"runtime": {
"socket": {
"type": "keyword",
"script": {
"source": "emit(doc['address'].value + ':' + doc['port'].value)"
}
}
}
}

结果:

{
"acknowledged" : true
}

此时在​​Index mapping​​ 文件里已经存在​​socket​​字段,然后查询,不必在运行时定义包含 ​​socket​​ 字段,例如

GET my_index/_search
{
"fields": [
"socket"
],
"query": {
"match": {
"socket": "1.2.3.4:8080"
}
}
}

结果(和使用静态字符串创建两个结果一样):

{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"address" : "1.2.3.4",
"port" : "8080"
},
"fields" : {
"socket" : [
"1.2.3.4:8080"
]
}
}
]
}
}

仅在要显示 ​​socket​​ 字段的值时才需要语句 ​​"fields": ["socket"]​​。 现在,字段查询可用于任何查询,但它不存在于索引中,并且不会增加索引的大小。 仅在查询需要 ​​socket​​ 以及需要它的文档时才计算 ​​socket​​。

12. ​​runtime​​和 ​​runtime_mapping​​区别:

使用​​runtime​​ 时定义的字段会存储到​​Index​​映射中,而​​runtime_mapping​​ 定义的字段只存在运行查询中。

映射字段:​​https://www.elastic.co/guide/en/elasticsearch/reference/7.11/runtime-mapping-fields.html#runtime-mapping-fields​

请求字段: ​​https://www.elastic.co/guide/en/elasticsearch/reference/7.11/runtime-search-request.html#runtime-search-request​

13. 在查询时覆盖字段值

PUT my_raw_index
{
"mappings": {
"properties": {
"raw_message": {
"type": "keyword"
},
"address": {
"type": "ip"
}
}
}
}

结果:

{
"acknowledged" : true,
"shards_acknowledged" : true,
"index" : "my_raw_index"
}