概览:基本结构
- 传统数据库和ES对比
传统数据库 | ES |
---|---|
database数据库 | index索引 |
table表 | type类型 |
字段 | id |
一行行的记录 | 一个个的文档 |
- 数据存储对比
传统数据库 | ES |
---|---|
varchar字符串int整数等等类型的字段组成的一行行记录 | json格式的文档 |
- 操作方式
传统数据库 | ES |
---|---|
SQL语句 | Restful风格的DSL |
场景:一个学校有若干班级,每个班级有若干学生
- 添加一条数据
添加一个 一班的小红(直接带id式插入)
POST http://192.168.1.104:9200/school/class_1/xiaohong { "name":"小红", "age":18, "height":165, "tags":["学习认真","学霸","漂亮"] }
school是index,class_1是type,xiaohong是id
不带id插入,系统会自动生成一个id
POST http://192.168.1.104:9200/school/class_1/ { "name":"无名", "age":17, "height":175, "tags":["学习认真","学霸"] }
“_id”: "qoFDdHUB4DF8xfvRk1av"这个是自动生成的id
再添加个小白等下给删除用
POST http://192.168.1.104:9200/school/class_1/xiaobai { "name":"小白", "age":18, "height":165 }
- 删除小白
DELETE http://192.168.1.104:9200/school/class_1/xiaobai
可以看到下图里的result,deleted删除掉了
3. 修改小红的信息(全部修改)
修改年龄为19,身高为170
PUT http://192.168.1.104:9200/school/class_1/xiaohong { "name":"小红", "age":19, "height":170, "tags":["学习认真","学霸","漂亮"] }
这里是直接put的,如果只写年龄和身高两个属性那么其他属性会丢掉的,因为这是直接替换掉了之前的整个文档,而不是部分
4. 修改小红的年龄为20(部分属性更新)
POST http://192.168.1.104:9200/school/class_1/xiaohong/_update { "doc":{ "age":20 } }
可以看到每次修改后_version都加1了
5. 修改完我们查询一下
GET http://192.168.1.104:9200/school/class_1/xiaohong
6. 查询所有
准备数据
POST http://192.168.1.104:9200/school/class_1/xiaoli
{
“name”:“小李”,
“age”:22,
“height”:176,
“tags”:[“调皮”,“学渣”]
}
GET http://192.168.1.104:9200/school/class_1/_search
查询结果
{ "took": 138, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 3, "relation": "eq" }, "max_score": 1.0, "hits": [ { "_index": "school", "_type": "class_1", "_id": "xiaohong", "_score": 1.0, "_source": { "name": "小红", "age": 20, "height": 170, "tags": [ "学习认真", "学霸", "漂亮" ] } }, { "_index": "school", "_type": "class_1", "_id": "qoFDdHUB4DF8xfvRk1av", "_score": 1.0, "_source": { "name": "无名", "age": 17, "height": 175, "tags": [ "学习认真", "学霸" ] } }, { "_index": "school", "_type": "class_1", "_id": "xiaoli", "_score": 1.0, "_source": { "name": "小李", "age": 22, "height": 176, "tags": [ "调皮", "学渣" ] } } ] } }
查询年龄在18及以上的同学
POST http://192.168.1.104:9200/school/class_1/_search { "query":{ "bool":{ "filter":[{ "range":{ "age":{ "gte":18 } } }] } }, "from":0, "size":10 }
query意思是查询,bool是不进行打分,filter是过滤器,range是范围过滤器
DSL结构比较复杂,多样性比较多写一个示范去理解吧,由于各种查找聚合操作可组合型比较多,还是后面再写
查询的时候有打分match和不打分,有过滤器,有must must not should,term terms range,
avg max sum min value_count有值的数量 cardinality不同值的数量(传统数据库的distinct count)
terms按照每个不同的值分桶(分成一个个的bucket)反正好多东西,各种操作,比SQL的内容和结构复杂多了,一时写不完。。。反正常用的就是查询,聚合,文字检索,相关度打分,高亮搜索等
{ "from":0, "size":10, "query":{ "bool":{ "must":[{ "range":{ "age":{ "gte":0 } } }], "must_not":[{ "range":{ "age":{ "lte":0 } } }], "should":[{ "term":{ "name":"小白" } }], "filter":[{ "range":{ "age":{ "gte":18 } } }] } }, "aggs":{ "names":{ "terms":{ "field":"age" }, "aggs":{ "stu_avg_height":{ "avg":{ "field":"height" } } } }, "age_gte_18_count":{ "value_count":{ "field":"age" } } } }
另外讲几个常用的
某字段等于某个值
{ "term":{ "name":"小红" } }
匹配某字段多个值
在filter里头加入一个
{ "terms":["值1","值2"] }
范围筛选
{ "range":{ "age":{ "gte":18 } } }
时间范围(d是天,s是秒其他以此类推,w是一周,now代表现在的时间)
{ "range":{ "finished_time":{ "gt":"now-1d" } } }
{ "range":{ "finished_time":{ "gte":"2020-09-27 16:10:10", "lte":"2020-10-27 16:10:10", "format":"yyyy-MM-dd HH:mm:ss", "time_zone":"+08:00" } } }
时间聚合分桶
"aggs":{ "my_date_buckets":{ "date_histogram":{ "field":"finished_time", "fixed_interval":"10m" } } }
找出最大的分桶 bucket(下面例子找出文档最多的桶子)
"aggs":{ "my_date_buckets":{ "date_histogram":{ "field":"finished_time", "fixed_interval":"10m" } }, "my_max_buckets":{ "max_bucket":{ "buckets_path":"date_interval>_count" } } }
还可以对每个桶子取前多少个,排序 下面取10个根据文档数排序 降序
"aggs":{ "my_date_buckets":{ "date_histogram":{ "field":"finished_time", "fixed_interval":"10m", "size":10, "order":{ "_count":"desc" } } } }