es占用的存储空间越来越小 es存储在哪里

转载

mob6454cc647bdb 2024-07-27 10:03:39

文章标签 es占用的存储空间越来越小 elasticsearch Elastic 倒排索引 文章分类 架构后端开发

前言

收集大量的日志信息之后，把这些日志存放在哪里？才能对其日志内容进行搜素呢？MySQL？

es占用的存储空间越来越小 es存储在哪里_elasticsearch

1.MySQL海量数据下全文检索效率低

如果MySQL里存储了1000W条这样的数据，每条记录的details字段有128个字。

用户想要查询details字段包含“ajax”这个关键词的记录。

select * from tb_log where log_details like "%ajax%";

使用like模糊查询且左边有通配符，会导致索引失效；

每次执行这条SQL语句，都需要逐一查询logtable中每条记录，最头痛的是找到这条记录之后，每次还要对这条记录中details字段里的文本内容进行全文扫描。

判断这个当前记录中的details字段是的内容否包含 “ajax”？有可能会查询 10000w*128次.

所以想要支持搜素details字段的Text内容的情况下，把海量的日志信息存在MySQL中是不太合理的。

2.MySQL无法实现分词查询功能

如果用户不明确搜索目的，虽然在电商平台上搜索了1个关键词 ‘平板电视’，但也想看看平板电脑、液晶电视等商品；

select * from tb_good where good_name like "%平板电视%";

以上SQL语句只能查询到商品名称包含“平板电视“4个字连在一起的平板电视商品；

无法按照自然语言习对关键词进行分词处理，导致用户无法获得丰富的查询结果，例如：把商品名称包含“电视“和“平板”的商品也查询出来；

ES不是银弹只做搜索功能，否则搭建ES维护起来也复杂；

一、Elasticsearch简介

分布式全文检索引擎;

Elasticsearch是一个基于Lucene的分布式、高性能、可伸缩的搜素和分析系统，它对外提供了RESTful web API。

ElasticSearch和MySQL相比有以下区别

Mysql数据操作具备事务性，而ElasticSearch没有
MySQL支持外键，而ElasticSearch不支持
Mysql采用B+树索引，而ElasticSearch采用倒排索引

1.倒排索引

ElasticSearch之所以支持全文检索，1大核心原因是ES在存储数据的时候可以对数据进行分词并构建倒排索引，这是一种典型的空间换时间思想；

倒排索引是1种用于全文搜索的数据结构

倒排索引不是由记录来确定属性值，而是由属性值来确定记录的位置，因而称为倒排索引(inverted index)。

倒排索引将文档中的每1个单词映射到包含该单词的所有文档的列表中

倒排索引在文本搜索和信息检索中广泛应用，如搜索引擎、网站搜索、文本分类等场景中。

es占用的存储空间越来越小 es存储在哪里_es占用的存储空间越来越小_02

2.全文检索

全文检索：把用户输入的关键词也进行分词，利用倒排索引，快速锁定关键词出现在那些文档。

说白了就是根据value查询key（根据文档中内容关键字，找到该该关键字所在的文档的）而非根据key查询value。

3.Lucene

Lucene是apache软件基金会4 jakarta项目组的一个java子项目，是一个开放源代码的全文检索引擎JAR包。帮助我我们实现了以上的需求。

lucene实现倒排索引之后，那么海量的数据如何分布式存储？如何高可用？集群节点之间如何管理？这是Elasticsearch实现的功能。

常说的ELK是Elasticsearch（全文搜素）+Logstash（内容收集）+Kibana（内容展示）三大开源框架首字母大写简称。

4.分片

分片是Elasticsearch中数据存储的基本单位。

Elasticsearch是1个分布式搜索引擎，它允许将1个索引分解成多个部分，每个部分都存储在不同的节点上。

这种分布式的存储方式使得Elasticsearch能够处理大量的数据，同时保持高效的查询性能。

1个分片默认可以存储最大约20亿个文档

索引的分片完成分配后，由于索引的路由机制，将不能重新修改分片数量。

5.分片副本

副本是对分片的复制，用于提供高可用性和故障恢复能力。

每个主分片都可以有1个或多个副本分片。

如果主分片所在的节点发生故障，副本分片可以用于提供服务，从而确保数据的高可靠性。

副本还允许查询请求在多个节点上并行处理，从而提高查询性能。

默认情况下，Elasticsearch为每个索引创建5个主分片，并为每个主分片创建1个副本分片。

6.分片和分片副本总结

分片和副本在Elasticsearch中扮演着不同的角色：

分片负责数据的分布存储
副本则负责数据的复制以提供高可用性和查询性能

用户可以根据自己的需求调整分片和副本的数量，以优化存储和查询性能。

7.ES底层写入流程

es占用的存储空间越来越小 es存储在哪里_es占用的存储空间越来越小_03

二、Elasticsearch安装

基于docker安装ElasticSearch与kibana客户端；

1.安装ElasticSearch

1.1.下载ElasticSearch7.10.1镜像

注意ES的版本要和客户端的依赖包版本保存一致；

docker pull elasticsearch:7.10.1

1.2.创建ElasticSearch配置和数据存储目录

mkdir -p /mydata/elasticsearch/{config,data,plugins}

1.3.配置文件配置ip信息

echo "http.host: 0.0.0.0" > /mydata/elasticsearch/config/elasticsearch.yml

1.4.配置目录权限

chmod -R 775 /mydata/elasticsearch/

1.5.启动容器并配置端口映射与目录映射

docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \
-e "discovery.type=single-node" \
-e ES_JAVA_OPTS="-Xms256m -Xmx1024m" \
-v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
-v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \
-v /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \
-d elasticsearch:7.10.1

1.6.确定是否启动成功

es占用的存储空间越来越小 es存储在哪里_es占用的存储空间越来越小_04

2.安装Kibana

Kibana是1个ElasticSearch的客户端，可以通过可视化界面完成对于ElasticSearch的各种操作。

2.1.下载Kibana7.10.1镜像

docker pull kibana:7.10.1

2.2.创建配置文件夹

[root@itcast ~]# mkdir -p  /mydata/kibana/config
[root@itcast ~]# cd  /mydata/kibana/config
[root@itcast config]# touch kibana.yml
[root@itcast config]#

2.3.创建并修改配置文件

## ** THIS IS AN AUTO-GENERATED FILE **
## Default Kibana configuration for docker target
##根据自己实际IP修改elasticsearch地址
server.name: kibana
server.host: "0"
elasticsearch.hosts: [ "http://192.168.56.18:9200" ]
xpack.monitoring.ui.container.elasticsearch.enabled: true
#开启中文配置
i18n.locale: "zh-CN"

2.4.启动容器

docker run -d \
>   --name=kibana \
>   --restart=always \
>   -p 5601:5601 \
>   -v /mydata/kibana/config/kibana.yml:/usr/share/kibana/config/kibana.yml \
>   kibana:7.10.1

2.5.访问效果如下

es占用的存储空间越来越小 es存储在哪里_倒排索引_05

3.安装DockerPortainer

Docker容器在修改完配置之后很容易出现故障导致启动失败；

此时就需要1个图形化的管理工具，对容器进行可视化操作，也方便查看日志，快速定位问题所在；

3.1.下载portainer镜像

docker pull portainer/portainer

3.2.启动portainer容器

portainer也是1个容器但是这个容器是用来管理其他容器的；

docker pull portainer/portainer
2.2 启动portainer
#创建文件存储文件
docker volume create portainer_data
#创建并启动容器 --restart=always   开机自动启动`
docker run -d -p 9000:9000 --name=portainer --restart=always
 -v /var/run/docker.sock:/var/run/docker.sock -v portainer_data:/data portainer/portainer

3.3.管理容器

最后确保以下3个容器都已经正常运行；

es占用的存储空间越来越小 es存储在哪里_es占用的存储空间越来越小_06

三、Elasticsearch使用

ES在使用时，会涉及到5个核心概念：

索引（Index）
映射（Mapping）
域（Field）
文档（Document）
倒排索引

在老版本ElasticSearch中还有1个概念Type，用于进行数据分类，但是在ES7开始已经将Type移除;

ElasticSearch	Mysql
索引（Index）	表（Table）
映射（Mapping）	表结构
域（Field）	字段列（Column）
文档（Document）	一条数据（Row）

1.索引（Index）

索引相当于关系型数据库中的一张表，1个index（索引）中包含若干个document（文档）；

1.1.添加索引

先添加索引在指定索引的映射

#先创建索引
PUT student
#再补充索引中的映射
PUT student/_mapping
{
  "properties":{
      "username":{
        "type":"text"
      },
    "age":{"type":"integer"},
    "birthday":{"type":"date","format":"yyyy-MM-dd" }
  }
  
}

添加索引并指定映射

PUT student
 {
  "mappings":{
  "properties":{
    "username":{"type":"text"},
    "age":{"type":"integer"},
    "birthday":{"type":"date","format":"yyyy-MM-dd" }
               }
          }
   
 }

1.2.查询索引

GET person

1.3.查询多个索引

PUT person1
GET person,person1

1.4. 查询所有索引信息

GET _all

1.5.删除索引

DELETE person1

2.别名机制

由于倒排索引的缘故，在ES中无法删除字段，也无法修改字段类型;

所有我们一般对外暴露index（索引）的别名，而不是真正的索引名称；

es占用的存储空间越来越小 es存储在哪里_elasticsearch_07

如果在创建索引的时候或者在映射中新增字段，指定了错误的映射，有2种解决方案；

2.1.索引没有投入使用

如果索引中没有导入数据，删除索引再重新指定映射；

2.2.索引已经投入使用

新创建1个正确的索引
把之前错误索引中数据导入到新建正确索引中
使用别名指向新的正确索引；

#模拟错误：新增了1个student1索引
PUT student1
PUT student1/_mapping
{
  "properties":{
    "birtday":{"type":"text"}
  }
  
}

#创建别名student--》student1
POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "student1",
        "alias": "student"
      }
    }
  ]
  
}

#发现指定错了映射，birtday字段应该为date类型
GET student1



#开始修改，新增1个索引指定正确的映射student2，设置birtday字段为date类型
PUT student2
PUT student2/_mapping
{
  "properties":{
    "birthday":{"type":"date","format":"yyyy-MM-dd" }
  }
  
}

#删除之前指定的别名 student--》student1
POST _aliases
{
  "actions": [
    {
      "remove": {
        "index": "student1",
        "alias": "student"
      }
    }
  ]
}

#创建新的别名student--》student2
POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "student2",
        "alias": "student"
      }
    }
  ]
  
}

#查看新增的索引student2确定索引也是正确的
GET student2
#删除student1
DELETE student1;
使用Java程序把之前的数据填充到student2索引中

3.域（Field）

域（Field）相当于数据表中的字段列；

创建完索引之后需要在索引中设置域的相关信息，如：域的名称，域的数据类型等。这个过程称为映射（Mapping）；

ES中的域（字段）支持以下几种数据类型

3.1.字符串

text：会进行分词，如华为手机，会分成：华为，手机。 被分出来的每一个词，称为term（词条）
keyword：不会进行分词，如华为手机，只有一个词条，即华为手机。

3.2.数值

long：带符号64位整数
integer：带符号32位整数
short：带符号16位整数
byte：带符号8位整数
double：双精度64位浮点数
float：单精度32位浮点数
half_float：半精度16位浮点数

3.3.布尔：

boolean

3.4.二进制：

binary

3.5.日期：

date

3.6.范围类型：

integer_range
float_range
long_range
double_range
date_range

3.7.数组

3.8.对象

4.文档（Document）

ES中最小的数据单元，代表索引中的一条数据，通常是使用json的数据格式表示的；

4.1.添加文档

4.1.1.添加文档，手动设置id

POST person/_doc/1
{
  "name":"张三",
  "age":18,
  "address":"北京"
}

4.1.2.添加文档，自动生成id

POST person/_doc
{
  "name":"李四",
  "age":20,
  "address":"北京"
}

4.2.查询文档

4.2.1.根据id查询文档

GET person/_doc/1

4.2.2.查询所有文档

GET person/_search

4.2.3.查询条件

GET logstash-2022.12.12/_search
{
  "_source": { 
    "includes":["@timestamp","message","stream"]
  },
  "sort": [
    {
      "@timestamp": {
        "order": "desc"
      }
    }
  ],
  "from": 1,
  "size": 3
}

4.3.修改文档

PUT person/_doc/1
{
  "name": "张三丰",
  "age": 180,
  "address": "武当山"
}

4.4.删除文档

DELETE person/_doc/1

四、SpringBoot操作ElasticSearch

使用restHighLevelClient操作ES的流程如下：

1.构建查询请求

es占用的存储空间越来越小 es存储在哪里_es占用的存储空间越来越小_08

2.处理ES响应的查询结果集

es占用的存储空间越来越小 es存储在哪里_es占用的存储空间越来越小_09

1.SpringBoot整合ElasticSearch

1.1.建立Maven工程，并引入相关坐标

es占用的存储空间越来越小 es存储在哪里_elasticsearch_10

es占用的存储空间越来越小 es存储在哪里_Elastic_11

<dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
        </dependency>
        <!--引入es的坐标-->
        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>elasticsearch-rest-high-level-client</artifactId>
            <version>7.10.1</version>
        </dependency>
        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>elasticsearch-rest-client</artifactId>
            <version>7.10.1</version>
        </dependency>
        <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch</artifactId>
            <version>7.10.1</version>
        </dependency>

        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <scope>test</scope>
        </dependency>
    </dependencies>

pom.xml

1.2.application.yml

elasticsearch:
  host: 192.168.56.18
  port: 9200

1.3.创建启动类EsApplication

es占用的存储空间越来越小 es存储在哪里_elasticsearch_10

es占用的存储空间越来越小 es存储在哪里_Elastic_11

package com.zhanggen.es.demo;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class EsApplication {
    public static void main(String[] args) {
        SpringApplication.run(EsApplication.class,args);
    }
}

EsApplication.java

1.4.创建es配置类

es占用的存储空间越来越小 es存储在哪里_elasticsearch_10

es占用的存储空间越来越小 es存储在哪里_Elastic_11

package com.zhanggen.es.demo.config;

import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestClientBuilder;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
@ConfigurationProperties(prefix="elasticsearch")
public class ElasticSearchConfig {

    private String host;

    private int port;


    public String getHost() {
        return host;
    }

    public void setHost(String host) {
        this.host = host;
    }

    public int getPort() {
        return port;
    }

    public void setPort(int port) {
        this.port = port;
    }

    @Bean
    public RestHighLevelClient restHighLevelClient(){
        RestClientBuilder builder = RestClient.builder(new HttpHost(host, port, "http"));
        builder.setRequestConfigCallback(requestConfigBuilder ->{
            requestConfigBuilder.setConnectionRequestTimeout(500000);
            requestConfigBuilder.setSocketTimeout(500000);
            requestConfigBuilder.setConnectTimeout(500000);
            return requestConfigBuilder;
        });
        return new RestHighLevelClient(builder);
    }
}

ElasticSearchConfig.java

1.5.创建测试类ESTest

es占用的存储空间越来越小 es存储在哪里_elasticsearch_10

es占用的存储空间越来越小 es存储在哪里_Elastic_11

package com.zhanggen.es.demo;

import org.elasticsearch.client.RestHighLevelClient;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.junit4.SpringRunner;

@SpringBootTest
@RunWith(SpringRunner.class)
public class ESTest {
    @Autowired
    private RestHighLevelClient restHighLevelClient;


}

ESTest.java

2.操作索引

对ES中索引的增、删、查操作

es占用的存储空间越来越小 es存储在哪里_elasticsearch_10

es占用的存储空间越来越小 es存储在哪里_Elastic_11

package com.itheima.es.demo;

import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.support.master.AcknowledgedResponse;
import org.elasticsearch.client.IndicesClient;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.client.indices.GetIndexResponse;
import org.elasticsearch.cluster.metadata.MappingMetadata;
import org.elasticsearch.common.xcontent.XContentType;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.junit4.SpringRunner;

import java.io.IOException;
import java.util.Map;

@SpringBootTest
@RunWith(SpringRunner.class)
public class ESTest {

    //注入操作ElasticSearch的客户端
    @Autowired
    private RestHighLevelClient restHighLevelClient;

    //测试创建索引
    @Test
    public void createIndexTest() throws IOException {
        IndicesClient indicesClient = restHighLevelClient.indices();
        //设置索引名称
        CreateIndexRequest createIndexRequest=new CreateIndexRequest("student");
        //定义索引的映射(结构)
        String mappingInfo = "{\n" +
                "  \"properties\":{\n" +
                "    \"name\":{\n" +
                "      \"type\":\"keyword\"\n" +
                "    },\n" +
                "    \"age\":{\n" +
                "      \"type\":\"integer\"\n" +
                "    },\n" +
                "    \"address\":{\n" +
                "      \"type\":\"text\"\n" +
                "    }\n" +
                "  }\n" +
                "}";
        createIndexRequest.mapping(mappingInfo, XContentType.JSON);
        CreateIndexResponse response = indicesClient.create(createIndexRequest, RequestOptions.DEFAULT);
        System.out.println(response.isAcknowledged());

    }

    //测试查询索引
    @Test
    public  void findIndexTest() throws IOException {
        IndicesClient indicesClient = restHighLevelClient.indices();
        GetIndexRequest getIndexRequest = new GetIndexRequest("student");
        GetIndexResponse response = indicesClient.get(getIndexRequest, RequestOptions.DEFAULT);
        Map<String, MappingMetadata> mappings = response.getMappings();
        for (String key : mappings.keySet()) {
            System.out.println(key+"==="+mappings.get(key).getSourceAsMap());
        }
    }
    //删除索引
    @Test
    public void delIndex() throws IOException {
        IndicesClient indicesClient = restHighLevelClient.indices();
        DeleteIndexRequest delIndexRequest = new DeleteIndexRequest("student");
        AcknowledgedResponse response = indicesClient.delete(delIndexRequest, RequestOptions.DEFAULT);
        System.out.println(response.isAcknowledged());
    }

}

ESTest.java

3.操作文档

es占用的存储空间越来越小 es存储在哪里_elasticsearch_10

es占用的存储空间越来越小 es存储在哪里_Elastic_11

package com.itheima.es.demo;

import com.alibaba.fastjson.JSON;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.delete.DeleteResponse;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.support.master.AcknowledgedResponse;
import org.elasticsearch.client.IndicesClient;
import org.elasticsearch.client.Request;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.client.indices.GetIndexResponse;
import org.elasticsearch.cluster.metadata.MappingMetadata;
import org.elasticsearch.common.xcontent.XContentType;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.junit4.SpringRunner;

import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

@SpringBootTest
@RunWith(SpringRunner.class)
public class ESTestDocument {

    //注入操作ElasticSearch的客户端
    @Autowired
    private RestHighLevelClient restHighLevelClient;

    //测试在student索引中创建1个文档
    @Test
    public void addDoc() throws IOException {
        IndexRequest indexRequest = new IndexRequest("student").id("1");
        indexRequest.source("{\n" +
                "  \"name\":\"张根\",\n" +
                "  \"age\":18,\n" +
                "  \"address\":\"河北\"\n" +
                "}", XContentType.JSON);
        IndexResponse response = restHighLevelClient.index(indexRequest, RequestOptions.DEFAULT);
        System.out.println(response.status());
    }
    //测试在student索引中创建1个文档2
    @Test
    public void addDoc1() throws IOException {
        IndexRequest indexRequest = new IndexRequest("student").id("1");
        //本质上就是在请求体包含1个json数据
        HashMap<String, Object> map = new HashMap<>();
        map.put("name","张根");
        map.put("age",18);
        map.put("address","河北");
        indexRequest.source(JSON.toJSONString(map),XContentType.JSON);
        IndexResponse response = restHighLevelClient.index(indexRequest, RequestOptions.DEFAULT);
        System.out.println(response.status());
    }

    //测试查询文档
    @Test
    public void findDocTest() throws IOException {
        GetRequest getRequest = new GetRequest("student","1");
        GetResponse response = restHighLevelClient.get(getRequest, RequestOptions.DEFAULT);
        String str = response.getSourceAsString();
        Map<String, Object> map = response.getSourceAsMap();
        System.out.println(map);

    }

    //测试删除文档
    @Test
    public void delDocTest() throws IOException {
        DeleteRequest deleteRequest = new DeleteRequest("student", "1");
        DeleteResponse response = restHighLevelClient.delete(deleteRequest,RequestOptions.DEFAULT);
        System.out.println(response.status());

    }

}

ESTestDocument.java

4.从MySQL中批量导入数据到ES

es占用的存储空间越来越小 es存储在哪里_elasticsearch_10

es占用的存储空间越来越小 es存储在哪里_Elastic_11

package com.zhanggen.es.service.impl;

import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.serializer.SerializerFeature;
import com.baomidou.mybatisplus.core.conditions.query.LambdaQueryWrapper;
import com.baomidou.mybatisplus.extension.plugins.pagination.Page;
import com.itheima.es.entity.HotelEntity;
import com.itheima.es.mapper.HotelMapper;
import com.itheima.es.service.HotelService;
import org.apache.lucene.search.TotalHits;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

import java.io.IOException;
import java.util.*;

@Service
public class HotelServiceImpl implements HotelService {

    @Autowired
    private HotelMapper hotelMapper;

    @Autowired
    private RestHighLevelClient restHighLevelClient;

    //批量导入
    @Override
    public int addDocToES() {
        int esTotal = 0;
        Long currentPage = 1L;
        Page<HotelEntity> page = new Page<>(currentPage, 200);
        LambdaQueryWrapper<HotelEntity> queryWrapper = new LambdaQueryWrapper<>();
        Integer integer = hotelMapper.selectCount(queryWrapper);
        Page<HotelEntity> hotelEntityPage = hotelMapper.selectPage(page, queryWrapper);
        //先算出数据库中一共有多少页
        long totalpage = hotelEntityPage.getPages();
        for (currentPage = 1L; currentPage <= totalpage; currentPage++) {
            //批量导入
            queryWrapper = new LambdaQueryWrapper<>();
            hotelEntityPage = hotelMapper.selectPage(page.setCurrent(currentPage), queryWrapper);
            //ES批量导入的API：请求集合
            BulkRequest bulkRequest = new BulkRequest();
            for (HotelEntity hotelEntity : hotelEntityPage.getRecords()) {
                String data = JSON.toJSONStringWithDateFormat(hotelEntity, "yyyy-MM-dd", SerializerFeature.WriteDateUseDateFormat);
                IndexRequest indexRequest = new IndexRequest("hotel").source(data, XContentType.JSON);
                bulkRequest.add(indexRequest);
            }
            try {
                BulkResponse response = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
            } catch (IOException e) {
                e.printStackTrace();
            }
            esTotal += hotelEntityPage.getRecords().size();
        }
        return esTotal;
    }

    //查询全部
    @Override
    public Map<String, Object> matchAllQuery() {
        //1.构建查询
        SearchRequest hotelSearch = new SearchRequest("hotel");
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        QueryBuilder queryBuilder = QueryBuilders.matchAllQuery();
        searchSourceBuilder.query(queryBuilder);
        hotelSearch.source(searchSourceBuilder);
        //返回的结果
        Map<String, Object> map = new HashMap<>();
        try {
            SearchResponse searchResponse = restHighLevelClient.search(hotelSearch, RequestOptions.DEFAULT);
            SearchHits searchResponseHits = searchResponse.getHits();
            //总条目
            long totalHits = searchResponseHits.getTotalHits().value;

            List<HotelEntity> list = new ArrayList<>();
            SearchHit[] searchHits = searchResponseHits.getHits();
            if (searchHits != null || searchHits.length > 0) {
                for (SearchHit searchHit : searchHits) {
                    String sourceAsString = searchHit.getSourceAsString();
                    list.add(JSON.parseObject(sourceAsString, HotelEntity.class));
                }
            }
            map.put("list", list);
            map.put("totalResultSize", totalHits);
        } catch (IOException e) {
            e.printStackTrace();
        }


        return map;
    }

    //分页查询
    @Override
    public Map<String, Object> pageQuery(int current, int size) {
        SearchRequest searchRequest = new SearchRequest("hotel");
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        QueryBuilder queryBuilder = QueryBuilders.matchAllQuery();
        searchSourceBuilder.query(queryBuilder);
        //设置分页
        searchSourceBuilder.from((current - 1) * size);
        searchSourceBuilder.size(size);
        searchRequest.source(searchSourceBuilder);
        Map<String, Object> resultMap = new HashMap<>();
        try {
            SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

            SearchHits hits = searchResponse.getHits();

            long totalHits = hits.getTotalHits().value;

            SearchHit[] searchHits = hits.getHits();

            List<HotelEntity> list = new ArrayList<>();

            for (SearchHit searchHit : searchHits) {
                String sourceAsString = searchHit.getSourceAsString();
                list.add(JSON.parseObject(sourceAsString, HotelEntity.class));
            }

            resultMap.put("list", list);
            resultMap.put("totalResultSize", totalHits);
            resultMap.put("current", current);
            //设置总页数
            resultMap.put("totalPage", (totalHits + size - 1) / size);

        } catch (IOException e) {
            e.printStackTrace();
        }

        return resultMap;
    }

HotelServiceImpl.java

5.迁移失败

如果数据前移失败，多半是ES设置的映射有问题，可以从responses中打印出错信息；

es占用的存储空间越来越小 es存储在哪里_elasticsearch_10

es占用的存储空间越来越小 es存储在哪里_Elastic_11

package com.hmall.search.feign;

import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.serializer.SerializerFeature;
import com.hmall.client.ItemClient;
import com.hmall.common.dto.Item;
import com.hmall.common.dto.PageDTO;
import com.hmall.search.domain.ItemDoc;
import org.elasticsearch.action.bulk.BulkItemResponse;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.BeanUtils;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.junit4.SpringRunner;

import java.io.IOException;


@RunWith(SpringRunner.class)
@SpringBootTest
public class FeignTest {
    @Autowired
    private RestHighLevelClient restHighLevelClient;
    @Autowired
    private ItemClient itemClient;

    @Test
    public void testFindItem() {
        //1次插入多少条
        Integer size = 1000;
        PageDTO<Item> itemPageDTO = itemClient.queryItemByPage(1, 0);
        Long total = itemPageDTO.getTotal();
        Long totalPage = total % size == 0 ? total / size : total / size + 1;
        for (Long currentPage = 1L; currentPage <= totalPage; currentPage++) {
            System.out.println("第" + currentPage + "页");
            itemPageDTO = itemClient.queryItemByPage(currentPage.intValue(), size);
            //批量导入
            //ES批量导入的API：请求集合
            BulkRequest bulkRequest = new BulkRequest();
            for (Item item : itemPageDTO.getList()) {
                ItemDoc itemDoc = new ItemDoc();
                BeanUtils.copyProperties(item, itemDoc);
                String data = JSON.toJSONStringWithDateFormat(itemDoc, "yyyy-MM-dd", SerializerFeature.WriteDateUseDateFormat);
                IndexRequest indexRequest = new IndexRequest("item").source(data, XContentType.JSON);
                bulkRequest.add(indexRequest);
            }
            try {
                BulkResponse bulkItemResponses = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
                if (bulkItemResponses.hasFailures()) {
                    BulkItemResponse[] itemResponse = bulkItemResponses.getItems();
                    for (BulkItemResponse response : itemResponse) {
                        if (response.isFailed()) {
                            System.out.println("=======" + response.getFailureMessage() + "=============");
                        }
                    }

                }
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        System.out.println(itemPageDTO.getList());
    }

    @Test
    public void testFindItem1() {
    }


}

FeignTest.java

五、Go操作ElasticSearch

Go语言和Ptython都可以使用其第3方库https://github.com/olivere/elastic来连接并操作ES。

es占用的存储空间越来越小 es存储在哪里_Elastic_26

注意APICleint的版本与你的ES版本相一致

例如：我们这里使用的ES是7.8.0的版本，那么我们下载的client也要与之对应为github.com/olivere/elastic/v7。

1.引入依赖

使用go.mod来管理依赖下载指定版本的第三库：

module go相关模块/elasticsearch

go 1.13

require github.com/olivere/elastic/v7 v7.0.4

2.代码

package main


import (
"context"
"fmt"

"github.com/olivere/elastic/v7"
)

// Elasticsearch demo

type Person struct {
	Name    string `json:"name"`
	Age     int    `json:"age"`
	Married bool   `json:"married"`
}

func main() {
	client, err := elastic.NewClient(elastic.SetURL("http://192.168.56.135:9200/"))
	if err != nil {
		// Handle error
		panic(err)
	}

	fmt.Println("connect to es success")
	p1 := Person{Name: "曹操", Age: 155, Married: true}
	put1, err := client.Index().
		Index("students").Type("go").
		BodyJson(p1).
		Do(context.Background())
	if err != nil {
		// Handle error
		panic(err)
	}
	fmt.Printf("Indexed user %s to index %s, type %s\n", put1.Id, put1.Index, put1.Type)
}

Python

es占用的存储空间越来越小 es存储在哪里_elasticsearch_10

es占用的存储空间越来越小 es存储在哪里_Elastic_11

"""

pip install elasticsearch==7.8.0
http://10.110.158.162:10072/ ES地址
http://10.110.158.162:10937/ Kibana地址
"""

from elasticsearch import Elasticsearch
clent = Elasticsearch(hosts="http://10.110.158.162:10072")
query = {
    "_source": {
        "includes": ["@timestamp", "message", "stream"]
    },
    "sort": [
        {
            "@timestamp": {
                "order": "desc"
            }
        }
    ],
    "from": 1,
    "size": 3
}
allDoc = clent.search(index="k8s-2022.12.26", body=query)
for row in allDoc["hits"]["hits"]:
    print(row["_source"])

es.py

六、日志查询

#查询当前ES数据库中存在所有索引
GET _cat/indices

#查询k8s-2022.12.26索引中定义的字段类型
GET k8s-2022.12.26/_mapping

#根据ID查询1个文档
GET k8s-2022.12.26/_doc/Q3YKTYUBy4Ru3dTnzJs0

#查询k8s-2022.12.26索引中的300条日志
GET k8s-2022.12.26/_search 
{
  "query":{
      "match": {"kubernetes.container_name":"nginx"}
    },
   "sort": [
    {
      "@timestamp": {
        "order": "desc"
      }
    }
  ],
  "from": 0,
  "size": 300
}



#针对text字段match查询tag=linux-messages的日志
GET k8s-2022.12.26/_search
{
   "query": {
    "match": {"tag":"linux-messages"}
  },
  "sort": [
    {
      "@timestamp": {
        "order": "desc"
      }
    }
  ],
  "from": 0,
  "size": 300
}


#查询tag=linux-messages以及host_ip=10.110.158.162的日志，指定显示_source的部分字段
GET k8s-2022.12.26/_search

{
   "_source": { 
    "includes":["@timestamp","host_ip"]
  },
  "query": {
    "bool":{
       "must":[
        {"match":{"tag":"linux-messages"}}
             ]
    }},
  "from": 0,
  "size": 20
}

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：android 7USB权限安卓手机usb权限

下一篇：身份证号前14位python 身份证号前14位匹配

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯