logstash导出ElasticSearch数据到CSV及同步两套ES的数据研究

原创

大数据工匠 2022-06-14 22:53:51 ©著作权

文章标签 elasticsearch 数据 html 文章分类 大数据

©著作权归作者所有：来自51CTO博客作者大数据工匠的原创作品，请联系作者获取转载授权，否则将追究法律责任

一、安装

首先保证系统安装了java 1.8以上版本

tar –zxvf /home/logstash-7.7.0.tar.gz -C /usr/local

进入/usr/local/logstash-7.7.0/bin目录下

二、配置文件

创建文件 convert_csv.conf

input{
      elasticsearch {
         hosts => ["127.0.0.1:9200"]    #要导出来的es服务器的地址
         index => "es的数据index值"       
     }

}

output{
    file {     filed => [””,””,””]    #filed字段选择，如下图展示_source中的信息，带有下划线的字段不需要选择

#path为指定文件路径
      path => "/home/csv.csv"
    }}

三、执行

./logstash -f convert_csv.conf

同步两套ES数据

只需要将上述output修改对应的es的output即可

output {

    elasticsearch {
        #全文检索服务
        hosts => "localhost:9200"
        #索引(数据库)
        index => "zl_dev"
        #类型(数据库表)
        document_type => "%{type}"
        #主键(防止重复)
        document_id => "%{id}"
    }
}

实时同步es数据

官方帮助文档：https://www.elastic.co/guide/en/logstash/current/plugins-inputs-elasticsearch.html

需要插件 Elasticsearch input plugin

input { # Read all documents from Elasticsearch matching the given query elasticsearch

{ hosts => "localhost"

query => '{ "query": { "match": { "statuscode": 200 } }, "sort": [ "_doc" ] }' } }

这里query的学习，请参考https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html

我参考了下

可以按照时间每天进行数据采集

{
    "query": {
        "match": {
            "fileddate":"2020-06-22"   #根据fileddate字段，取得日期2020-06-22数据
        }
    }
}

计划增加输入读取选项 schedule index

input { # Read all documents from Elasticsearch matching the given query elasticsearch { hosts => "localhost" index => "按照时间索引传入" query => '{ "query": { "match": { "statuscode": 200 } }, "sort": [ "_doc" ] }' schedule => "* * 1 * *" #每天凌晨1点执行 #任务调度 (分、时、天、月、年，全部为*默认含义为每分钟都更新) } }

输出与上述输出到es一致即可

作者：少帅

您的支持是对博主最大的鼓励，感谢您的认真阅读。

本文版权归作者所有，欢迎转载，但请保留该声明。