Elasticsearch从入门到精通！后端必须掌握的搜索引擎神器全攻略

原创

我爱哇哈哈 2025-10-13 13:52:11 博主文章分类：Java开发实战 ©著作权

文章标签 json analyzer Elastic 文章分类 架构后端开发 yyds干货盘点

©著作权归作者所有：来自51CTO博客作者我爱哇哈哈的原创作品，请联系作者获取转载授权，否则将追究法律责任

今天咱们聊聊Elasticsearch。它功能强大，搜索、分析、聚合样样精通；但又因为概念太多，配置复杂，一不小心就踩坑。今天就用最通俗的话给你讲透ES的核心知识点！

前言：为什么要学Elasticsearch？

哎，说起搜索功能，估计每个后端都有过这样的经历：

用MySQL的LIKE查询，数据一多就慢得要死
老板要个全文搜索功能，SQL写得比绕口令还复杂
想做个商品推荐，传统数据库根本玩不转
日志分析、数据统计，关系型数据库力不从心

这些问题的根源是什么？就是没有选对合适的工具！

Elasticsearch就是专门解决搜索和分析问题的神器：

搜索快如闪电：毫秒级全文搜索
扩展性极强：轻松处理PB级数据
功能丰富：搜索、聚合、分析一站式解决
生态完善：ELK Stack（Elasticsearch + Logstash + Kibana）

今天我就用最接地气的方式，把ES的核心知识点给你讲透！

第一部分：核心概念 - 搞懂ES的"江湖规则"

1.1 ES vs 传统数据库

首先搞清楚ES和MySQL的区别，这很重要：

概念	MySQL	Elasticsearch	说明
数据库	Database	Index（索引）	数据存储的最大单位
表	Table	Type（类型）*	数据的分类，7.x后废弃
行	Row	Document（文档）	一条完整的数据记录
列	Column	Field（字段）	数据的具体属性
主键	Primary Key	_id	文档的唯一标识

注意：ES 7.x以后取消了Type概念，一个Index就相当于MySQL的一张表。

1.2 倒排索引 - ES的核心秘密

什么是倒排索引？

想象一下字典的索引页：

正排索引：根据页码找内容（MySQL就是这样）
倒排索引：根据内容找页码（ES的核心）

传统方式：
文档1: "Java是一门编程语言"
文档2: "Python也是编程语言"  
文档3: "编程需要学习Java"

倒排索引：
"Java" -> [文档1, 文档3]
"编程" -> [文档1, 文档2, 文档3]  
"语言" -> [文档1, 文档2]

这就是为什么ES搜索这么快的原因！

1.3 集群架构 - ES的"组织架构"

集群（Cluster）：多个ES节点组成的集群 节点（Node）：一个ES实例就是一个节点 分片（Shard）：索引数据的水平分割 副本（Replica）：分片的备份

集群架构示例：
Node1: [Primary Shard 0] [Replica Shard 1]
Node2: [Primary Shard 1] [Replica Shard 0]  
Node3: [Primary Shard 2] [Replica Shard 2]

为什么要分片？

单机存不下大量数据
分布式并行提高性能
一个节点挂了，其他节点还能工作

第二部分：基础操作 - 从增删改查开始

2.1 索引管理

创建索引：

PUT /my_index
{
  "settings": {
    "number_of_shards": 3,      // 主分片数
    "number_of_replicas": 1     // 副本数
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "ik_max_word"  // 中文分词器
      },
      "price": {
        "type": "double"
      },
      "create_time": {
        "type": "date"
      }
    }
  }
}

查看索引：

GET /my_index
GET /_cat/indices?v  // 查看所有索引

删除索引：

DELETE /my_index

2.2 文档操作

新增文档：

// 指定ID
PUT /my_index/_doc/1
{
  "title": "Elasticsearch实战教程",
  "price": 99.9,
  "create_time": "2024-03-15"
}

// 自动生成ID
POST /my_index/_doc
{
  "title": "Java编程指南",
  "price": 89.0,
  "create_time": "2024-03-16"
}

查询文档：

// 根据ID查询
GET /my_index/_doc/1

// 查询所有文档
GET /my_index/_search
{
  "query": {
    "match_all": {}
  }
}

更新文档：

// 全量更新
PUT /my_index/_doc/1
{
  "title": "ES高级教程",
  "price": 199.9,
  "create_time": "2024-03-15"
}

// 部分更新
POST /my_index/_update/1
{
  "doc": {
    "price": 159.9
  }
}

删除文档：

DELETE /my_index/_doc/1

2.3 批量操作

POST /_bulk
{"index": {"_index": "my_index", "_id": "1"}}
{"title": "商品1", "price": 100}
{"index": {"_index": "my_index", "_id": "2"}}
{"title": "商品2", "price": 200}
{"update": {"_index": "my_index", "_id": "1"}}
{"doc": {"price": 150}}
{"delete": {"_index": "my_index", "_id": "2"}}

第三部分：查询语法 - ES查询的"十八般武艺"

3.1 基础查询

匹配查询（最常用）：

GET /my_index/_search
{
  "query": {
    "match": {
      "title": "Elasticsearch教程"
    }
  }
}

精确查询：

GET /my_index/_search
{
  "query": {
    "term": {
      "price": 99.9
    }
  }
}

范围查询：

GET /my_index/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 50,    // 大于等于
        "lte": 200    // 小于等于
      }
    }
  }
}

模糊查询：

GET /my_index/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "Elastcsearch",  // 故意写错
        "fuzziness": 2            // 允许2个字符差异
      }
    }
  }
}

3.2 复合查询

布尔查询（最强大）：

GET /my_index/_search
{
  "query": {
    "bool": {
      "must": [        // 必须满足（AND）
        {"match": {"title": "Elasticsearch"}}
      ],
      "should": [      // 应该满足（OR）
        {"match": {"title": "教程"}},
        {"match": {"title": "实战"}}
      ],
      "must_not": [    // 必须不满足（NOT）
        {"term": {"price": 0}}
      ],
      "filter": [      // 过滤条件（不计算分数）
        {"range": {"price": {"gte": 50}}}
      ]
    }
  }
}

多字段查询：

GET /my_index/_search
{
  "query": {
    "multi_match": {
      "query": "Java编程",
      "fields": ["title^2", "description"],  // title权重x2
      "type": "best_fields"
    }
  }
}

3.3 高级查询功能

高亮显示：

GET /my_index/_search
{
  "query": {
    "match": {"title": "Elasticsearch"}
  },
  "highlight": {
    "fields": {
      "title": {
        "pre_tags": ["<mark>"],
        "post_tags": ["</mark>"]
      }
    }
  }
}

分页查询：

GET /my_index/_search
{
  "query": {"match_all": {}},
  "from": 0,      // 偏移量
  "size": 10      // 返回数量
}

排序：

GET /my_index/_search
{
  "query": {"match_all": {}},
  "sort": [
    {"price": {"order": "desc"}},
    {"_score": {"order": "desc"}}
  ]
}

第四部分：聚合分析 - ES的"数据分析神器"

4.1 指标聚合

GET /my_index/_search
{
  "size": 0,  // 不返回文档，只要聚合结果
  "aggs": {
    "avg_price": {
      "avg": {"field": "price"}
    },
    "max_price": {
      "max": {"field": "price"}
    },
    "min_price": {
      "min": {"field": "price"}
    },
    "total_sales": {
      "sum": {"field": "sales"}
    },
    "price_stats": {
      "stats": {"field": "price"}  // 一次性得到所有统计信息
    }
  }
}

4.2 桶聚合

分组统计：

GET /my_index/_search
{
  "size": 0,
  "aggs": {
    "group_by_category": {
      "terms": {
        "field": "category.keyword",
        "size": 10
      },
      "aggs": {
        "avg_price_per_category": {
          "avg": {"field": "price"}
        }
      }
    }
  }
}

时间直方图：

GET /my_index/_search
{
  "size": 0,
  "aggs": {
    "sales_over_time": {
      "date_histogram": {
        "field": "create_time",
        "calendar_interval": "1M",  // 按月统计
        "format": "yyyy-MM"
      },
      "aggs": {
        "monthly_sales": {
          "sum": {"field": "sales"}
        }
      }
    }
  }
}

范围聚合：

GET /my_index/_search
{
  "size": 0,
  "aggs": {
    "price_ranges": {
      "range": {
        "field": "price",
        "ranges": [
          {"to": 100},
          {"from": 100, "to": 500},
          {"from": 500}
        ]
      }
    }
  }
}

第五部分：Mapping映射 - 告诉ES如何理解你的数据

5.1 数据类型

核心数据类型：

PUT /my_index
{
  "mappings": {
    "properties": {
      "title": {"type": "text"},           // 全文搜索
      "keyword": {"type": "keyword"},      // 精确匹配
      "price": {"type": "double"},         // 浮点数
      "count": {"type": "integer"},        // 整数
      "is_active": {"type": "boolean"},    // 布尔值
      "create_time": {"type": "date"},     // 日期
      "location": {"type": "geo_point"}    // 地理位置
    }
  }
}

复杂数据类型：

PUT /my_index
{
  "mappings": {
    "properties": {
      "user": {
        "type": "object",        // 对象类型
        "properties": {
          "name": {"type": "text"},
          "age": {"type": "integer"}
        }
      },
      "tags": {"type": "keyword"},  // 数组类型
      "content": {
        "type": "text",
        "fields": {                 // 多字段映射
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }
    }
  }
}

5.2 分词器配置

中文分词器（IK）：

PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "ik_max_word",
          "filter": ["lowercase", "stop"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "analyzer": "my_analyzer",
        "search_analyzer": "ik_smart"
      }
    }
  }
}

第六部分：性能优化 - 让ES跑得更快

6.1 索引优化

批量操作：

// Java客户端批量插入
BulkRequest bulkRequest = new BulkRequest();
for (int i = 0; i < 1000; i++) {
    IndexRequest indexRequest = new IndexRequest("my_index")
        .id(String.valueOf(i))
        .source("field1", "value" + i);
    bulkRequest.add(indexRequest);
}
BulkResponse bulkResponse = client.bulk(bulkRequest, RequestOptions.DEFAULT);

索引设置优化：

PUT /my_index
{
  "settings": {
    "index": {
      "number_of_shards": 1,
      "number_of_replicas": 0,        // 写入时先设为0
      "refresh_interval": "30s",      // 延长刷新间隔
      "translog": {
        "durability": "async",        // 异步事务日志
        "sync_interval": "30s"
      }
    }
  }
}

6.2 查询优化

使用过滤器代替查询：

// ❌ 慢：使用查询（计算分数）
{
  "query": {
    "bool": {
      "must": [
        {"range": {"price": {"gte": 100}}}
      ]
    }
  }
}

// ✅ 快：使用过滤器（不计算分数）
{
  "query": {
    "bool": {
      "filter": [
        {"range": {"price": {"gte": 100}}}
      ]
    }
  }
}

合理使用分页：

// ❌ 深分页性能差
SearchRequest request = new SearchRequest("my_index");
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.from(10000).size(10);  // 跳过1万条

// ✅ 使用scroll或search_after
SearchRequest scrollRequest = new SearchRequest("my_index");
scrollRequest.scroll(TimeValue.timeValueMinutes(1L));
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.size(1000);
scrollRequest.source(sourceBuilder);

6.3 硬件优化

内存配置：

# elasticsearch.yml
# JVM堆内存设置为物理内存的一半，但不超过32GB
# -Xms16g -Xmx16g

# 系统配置
vm.max_map_count=262144
fs.file-max=65536

磁盘优化：

使用SSD硬盘
数据和日志分离存储
定期清理不需要的索引

第七部分：实战案例 - 电商搜索系统

7.1 商品搜索实现

索引设计：

PUT /products
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "analysis": {
      "analyzer": {
        "product_analyzer": {
          "type": "custom",
          "tokenizer": "ik_max_word",
          "filter": ["lowercase", "synonym_filter"]
        }
      },
      "filter": {
        "synonym_filter": {
          "type": "synonym",
          "synonyms": [
            "手机,mobile,phone",
            "电脑,computer,PC"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "product_analyzer",
        "fields": {
          "keyword": {"type": "keyword"}
        }
      },
      "category": {"type": "keyword"},
      "brand": {"type": "keyword"},
      "price": {"type": "double"},
      "sales": {"type": "integer"},
      "rating": {"type": "float"},
      "tags": {"type": "keyword"},
      "description": {
        "type": "text",
        "analyzer": "ik_max_word"
      },
      "create_time": {"type": "date"}
    }
  }
}

搜索接口实现：

@RestController
public class ProductSearchController {
    
    @Autowired
    private ElasticsearchRestTemplate elasticsearchTemplate;
    
    @GetMapping("/search")
    public SearchResult searchProducts(@RequestParam String keyword,
                                     @RequestParam(defaultValue = "0") int page,
                                     @RequestParam(defaultValue = "20") int size,
                                     @RequestParam(required = false) String category,
                                     @RequestParam(required = false) String priceRange) {
        
        // 构建查询条件
        BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
        
        // 关键词搜索
        if (StringUtils.hasText(keyword)) {
            MultiMatchQueryBuilder multiMatch = QueryBuilders.multiMatchQuery(keyword)
                .field("name", 3.0f)        // 商品名权重3
                .field("description", 1.0f)  // 描述权重1
                .field("tags", 2.0f)        // 标签权重2
                .type(MultiMatchQueryBuilder.Type.BEST_FIELDS)
                .operator(Operator.AND);
            
            boolQuery.must(multiMatch);
        }
        
        // 分类过滤
        if (StringUtils.hasText(category)) {
            boolQuery.filter(QueryBuilders.termQuery("category", category));
        }
        
        // 价格范围过滤
        if (StringUtils.hasText(priceRange)) {
            String[] range = priceRange.split("-");
            RangeQueryBuilder priceQuery = QueryBuilders.rangeQuery("price");
            if (range.length == 2) {
                priceQuery.gte(Double.parseDouble(range[0]))
                         .lte(Double.parseDouble(range[1]));
            }
            boolQuery.filter(priceQuery);
        }
        
        // 构建搜索请求
        NativeSearchQueryBuilder searchQuery = new NativeSearchQueryBuilder()
            .withQuery(boolQuery)
            .withPageable(PageRequest.of(page, size))
            .withSort(SortBuilders.scoreSort().order(SortOrder.DESC))  // 按相关性排序
            .withSort(SortBuilders.fieldSort("sales").order(SortOrder.DESC)) // 按销量排序
            .withHighlightFields(
                new HighlightBuilder.Field("name").preTags("<mark>").postTags("</mark>")
            );
        
        // 添加聚合统计
        searchQuery.addAggregation(
            AggregationBuilders.terms("category_agg").field("category").size(10)
        );
        searchQuery.addAggregation(
            AggregationBuilders.terms("brand_agg").field("brand").size(10)
        );
        
        // 执行搜索
        SearchHits<Product> searchHits = elasticsearchTemplate.search(
            searchQuery.build(), Product.class);
        
        // 处理搜索结果
        List<ProductVO> products = searchHits.stream()
            .map(this::convertToVO)
            .collect(Collectors.toList());
        
        // 处理聚合结果
        Map<String, List<String>> facets = extractFacets(searchHits.getAggregations());
        
        return SearchResult.builder()
            .products(products)
            .total(searchHits.getTotalHits())
            .facets(facets)
            .build();
    }
    
    private ProductVO convertToVO(SearchHit<Product> hit) {
        Product product = hit.getContent();
        ProductVO vo = new ProductVO();
        
        // 基本信息
        BeanUtils.copyProperties(product, vo);
        
        // 处理高亮
        Map<String, List<String>> highlights = hit.getHighlightFields();
        if (highlights.containsKey("name")) {
            vo.setHighlightName(highlights.get("name").get(0));
        }
        
        // 设置相关性分数
        vo.setScore(hit.getScore());
        
        return vo;
    }
}

7.2 搜索建议实现

自动补全：

PUT /suggestions
{
  "mappings": {
    "properties": {
      "suggest": {
        "type": "completion",
        "analyzer": "ik_max_word"
      }
    }
  }
}

// 搜索建议
POST /suggestions/_search
{
  "suggest": {
    "product_suggest": {
      "prefix": "iphone",
      "completion": {
        "field": "suggest",
        "size": 10
      }
    }
  }
}

第八部分：运维监控 - 保证ES稳定运行

8.1 集群监控

关键指标：

# 集群健康状态
GET /_cluster/health

# 节点信息
GET /_nodes/stats

# 索引统计
GET /_stats

# 热点线程
GET /_nodes/hot_threads

Java监控代码：

@Component
public class ElasticsearchMonitor {
    
    @Autowired
    private RestHighLevelClient client;
    
    @Scheduled(fixedRate = 30000)  // 每30秒检查一次
    public void monitorClusterHealth() {
        try {
            ClusterHealthRequest request = new ClusterHealthRequest();
            ClusterHealthResponse response = client.cluster().health(request, RequestOptions.DEFAULT);
            
            ClusterHealthStatus status = response.getStatus();
            if (status == ClusterHealthStatus.RED) {
                // 发送告警
                alertService.sendAlert("ES集群状态异常: " + status);
            }
            
            // 记录监控指标
            monitoringService.recordMetric("es.cluster.nodes", response.getNumberOfNodes());
            monitoringService.recordMetric("es.cluster.active_shards", response.getActiveShards());
            monitoringService.recordMetric("es.cluster.relocating_shards", response.getRelocatingShards());
            
        } catch (Exception e) {
            log.error("ES集群监控失败", e);
        }
    }
}

8.2 性能调优

GC优化：

# jvm.options
-Xms16g
-Xmx16g
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
-XX:+UnlockExperimentalVMOptions
-XX:G1NewSizePercent=30
-XX:G1MaxNewSizePercent=40

索引生命周期管理：

PUT /_ilm/policy/log_policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "50GB",
            "max_age": "30d"
          }
        }
      },
      "warm": {
        "min_age": "30d",
        "actions": {
          "allocate": {
            "number_of_replicas": 0
          }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}