上篇:大数据之实时项目 第7天 es安装说明


1、在kibana基本操作

如图所示:

(1)创建表结构

es适合大数据存储吗 es存储大量数据_HTTP


编写代码:创建结构数据

PUT gmall0315_test/_doc/1
{
  "name":"zhangsan",
  "age":23,
  "amout":250.1
}
(2)查询数据

如图所示:

es适合大数据存储吗 es存储大量数据_List_02


其中:

text:表示分词

  1. 作用:
    全文配配、占空间大(磁盘、内存)

keyword:表示不分词

  1. 作用:
    精确匹配,和作为聚合字段、占空间小

编写代码:查询结构数据

GET gmall0315_test/_mapping
(3)分组查询

es适合大数据存储吗 es存储大量数据_es适合大数据存储吗_03


编写代码:分组查询结构数据

GET gmall0315_test/_search
{
  "query":{
    "bool":{
      "filter":{
        "term": {
          "name.keyword": "zhangsan"
        }
      }
    }
  }
}
(4) 重复分组查询

如图所示:

es适合大数据存储吗 es存储大量数据_List_04


编写代码:重复分组查询结构数据

GET gmall0315_test/_search
{
  "query":{
    "bool":{
      "filter":{
        "term": {
          "name.keyword": "zhangsan"
        }
      }
    }
  },
  "aggs": {
    "groupby_name": {
      "terms": {
        "field": "name.keyword",
        "size": 10
      }
    }
  }
}
(5) 索引创建语句

es适合大数据存储吗 es存储大量数据_数据_05


索引创建语句编写

PUT gmall0315_test2/  
{
    "mappings":{
      "_doc": {
        "properties": {
          "age": {
            "type": "long"
          },
          "amout": {
            "type": "float"
          },
          "name": {
                "type": "keyword"
          },
          "phone_num": {
       
                "type": "keyword",
                "index": false
              }
            }
          }
        } 
}

其:
要不要使用索引:index:true或false,默认是false
要不要使用分词:text(分词)、keyword(不分词)


2、设计es索引结构

(1)项目索引创建语句

es适合大数据存储吗 es存储大量数据_es适合大数据存储吗_06


代码编写

PUT gmall0315_dau
{
  "mappings": {
    "_doc":{
      "properties":{
         "mid":{
           "type":"keyword" 
         },
         "uid":{
           "type":"keyword"
         },
         "area":{
           "type":"keyword"
         },
         "os":{
           "type":"keyword"
         },
         "ch":{
           "type":"keyword"
         },
         "vs":{
           "type":"keyword"
         },
         "logDate":{
           "type":"keyword"
         },
         "logHour":{
           "type":"keyword"
         },
         "logHourMinute":{
           "type":"keyword"
         },
         "ts":{
           "type":"long"
         } 
      }
    }
  }
}

(2)分清楚索引类型

需要索引,也需要分词

需要索引,但不需要分词

既不需要索引,也不需要分词

标题、商品、分类名称;

类型id、日期、数量、年龄、各种id

不被会用于条件过滤,经过脱敏的字段、138****0101

type:”text“

type:”keyword“

insex:false

(3)保存到es中

  1. 在common子模块代码编写
    (1)pom文件依赖添加:
<dependency>
            <groupId>io.searchbox</groupId>
            <artifactId>jest</artifactId>
            <version>5.3.3</version>
        </dependency>

        <dependency>
            <groupId>net.java.dev.jna</groupId>
            <artifactId>jna</artifactId>
            <version>4.5.2</version>
        </dependency>

        <dependency>
            <groupId>org.codehaus.janino</groupId>
            <artifactId>commons-compiler</artifactId>
            <version>2.7.8</version>
        </dependency>

(2) 代码编写

es适合大数据存储吗 es存储大量数据_HTTP_07

MyEsUtil.scala

package com.study.gamll0315.common.util
import java.util.Objects

import io.searchbox.client.config.HttpClientConfig
import io.searchbox.client.{JestClient, JestClientFactory}
import io.searchbox.core.Index

object MyEsUtil {
  private val ES_HOST = "http://flink102"
  private val ES_HTTP_PORT = 9200
  private var factory:JestClientFactory = null


  /**
   * 获取客户端
   *
   * @return jestclient
   */
  def getClient: JestClient = {
    if (factory == null) build()
    factory.getObject
  }

  /**
   * 关闭客户端
   */
  def close(client: JestClient): Unit = {
    if (!Objects.isNull(client)) try
      client.shutdownClient()
    catch {
      case e: Exception =>
        e.printStackTrace()
    }
  }

  /**
   * 建立连接
   */
  private def build(): Unit = {
    factory = new JestClientFactory
    factory.setHttpClientConfig(new HttpClientConfig.Builder(ES_HOST + ":" + ES_HTTP_PORT).multiThreaded(true)
      .maxTotalConnection(20) //连接总数
      .connTimeout(10000).readTimeout(10000).build)

  }

  def main(args: Array[String]): Unit = {
    val jedis: JestClient = getClient
    val source="{\n  \"name\":\"zhang4\",\n  \"age\":23,\n  \"amout\":250.1,\n  \"phone_num\":\"138*****6541\"\n}"
    val index: Index = new Index.Builder(source).index("gmall0315_test").`type`("_doc").build()
    jedis.execute(index)
    //关闭资源
    close(jedis)
  }
}

运行程序

es适合大数据存储吗 es存储大量数据_List_08

(3)在kibana监控平台查看

如图所示:直接执行

GET gmall0315_test/_search

es适合大数据存储吗 es存储大量数据_List_09

(4)批量插入es基本代码实现

/**
   * 批量插入es
   * @param indexName
   * @param list
   */
  def indexBulk(indexName:String,list: List[Any]): Unit ={
    val jedis: JestClient = getClient
    val bulkBuilder = new Bulk.Builder().defaultIndex(indexName).defaultType("_doc")
    for (doc<-list){
      val index: Index = new Index.Builder(doc).build()
      bulkBuilder.addAction(index)
    }
     //返回执行多少条
    val items: util.List[BulkResult#BulkResultItem] = jedis.execute(bulkBuilder.build()).getItems
   println(s"保存=${items.size()}")
    //关闭资源
    close(jedis)
  }
}

完整代码实现

MyEsUtil .scala

package com.study.gamll0315.common.util
import java.util.Objects

import io.searchbox.client.config.HttpClientConfig
import io.searchbox.client.{JestClient, JestClientFactory}
import io.searchbox.core.{Bulk, Index}

object MyEsUtil {
  private val ES_HOST = "http://flink102"
  private val ES_HTTP_PORT = 9200
  private var factory:JestClientFactory = null


  /**
   * 获取客户端
   *
   * @return jestclient
   */
  def getClient: JestClient = {
    if (factory == null) build()
    factory.getObject
  }

  /**
   * 关闭客户端
   */
  def close(client: JestClient): Unit = {
    if (!Objects.isNull(client)) try
      client.shutdownClient()
    catch {
      case e: Exception =>
        e.printStackTrace()
    }
  }

  /**
   * 建立连接
   */
  private def build(): Unit = {
    factory = new JestClientFactory
    factory.setHttpClientConfig(new HttpClientConfig.Builder(ES_HOST + ":" + ES_HTTP_PORT).multiThreaded(true)
      .maxTotalConnection(20) //连接总数
      .connTimeout(10000).readTimeout(10000).build)

  }

  def main(args: Array[String]): Unit = {
    val jedis: JestClient = getClient
    val source="{\n  \"name\":\"zhang4\",\n  \"age\":23,\n  \"amout\":250.1,\n  \"phone_num\":\"138*****6541\"\n}"
    val index: Index = new Index.Builder(source).index("gmall0315_test").`type`("_doc").build()
    jedis.execute(index)
    //关闭资源
    close(jedis)
  }

  /**
   * 批量插入es
   * @param indexName
   * @param list
   */
  def indexBulk(indexName:String,list: List[Any]): Unit ={
    val jedis: JestClient = getClient
    val bulkBuilder = new Bulk.Builder().defaultIndex(indexName).defaultType("_doc")
    for (doc<-list){
      val index: Index = new Index.Builder(doc).build()
      bulkBuilder.addAction(index)
    }
    //返回执行多少条
    val items: util.List[BulkResult#BulkResultItem] = jedis.execute(bulkBuilder.build()).getItems
   println(s"保存=${items.size()}")
    //关闭资源
    close(jedis)
  }
}

另:

es适合大数据存储吗 es存储大量数据_List_10

public static final String ES_INDEX_DAU="gmall0315_dau";

es适合大数据存储吗 es存储大量数据_HTTP_11

//ES
        val list: List[Startuplog] = startuplogItr.toList
        for (startuplog <- list) {
          val key = "dau:" + startuplog.logDate
          val value = startuplog.mid
          jedis.sadd(key, value)
          println(startuplog)  //往es中保存

        }
        MyEsUtil.indexBulk(GmallConstant.ES_INDEX_DAU,startuplogItr.toList)

接下来,先启动JsonMocker,再启动DauApp数据模拟发送

注意的是,以下进程必须要启动

[root@flink102 ~]# jps -l
1745 org.apache.zookeeper.server.quorum.QuorumPeerMain
9236 sun.tools.jps.Jps
8582 kafka.Kafka
7064 org.elasticsearch.bootstrap.Elasticsearch
8907 kafka.tools.ConsoleConsumer
9181 gamll0315-logger-0.0.1-SNAPSHOT.jar

另:我们还需要改动1处,如图所示:

es适合大数据存储吗 es存储大量数据_数据_12


另外我们还需要把redis过滤的清单把它删除掉

//查看redis的清单数据
127.0.0.1:6379> keys *
1) "dau:2020-03-17"
//删除redis的清单数据
127.0.0.1:6379> flushall
OK
//再次查看没有数据了
127.0.0.1:6379> keys *
(empty list or set)
127.0.0.1:6379>

再次启动程序

先启动JsonMocker,再启动DauApp数据模拟发送

es适合大数据存储吗 es存储大量数据_es适合大数据存储吗_13

最后,我们就可以在kibana监控平台查看,执行

GET gmall0315_dau/_search

es适合大数据存储吗 es存储大量数据_List_14