ES是基于Lucene的开源搜索引擎,其查询语法关键字部分和Lucene大致一样:

分页: from/size、字段:fields、排序:sort、查询:query

过滤:filter、高亮:highlight、统计:facet

ES的搜索类型有4种(以下说明是基于elasticsearch2.3):

query and fetch (速度最快)(返回N倍数据量)     受保护,5.3之前可用

query then fetch (默认的搜索方式)

DFS query and fetch    没有了

DFS query then fetch (可以更精确控制搜索打分和排名)

DFS:这个D可能是Distributed,F可能是frequency的缩写,至于S可能是Scatter的缩写,整个单词可能是分布式词频率和文档频率散发的缩写。

初始化散发:从ES的官方网站可以发现,初始化散发其实就是在进行真正的查询之前,先把各个分片的词频率和文档频率收集一下,然后进行词搜索的时候,各分片依据全局的词频率和文档频率进行搜索和排名。显然如果使用DFS_QUERY_THEN_FETCH这种查询方式,效率是最低的,因为一个搜索,可能要请求3次分片。但使用DFS方法,搜索精度应该是最高的。

综上,从性能考虑:

QUERY_AND_FETCH是最快的,DFS_QUERY_THEN_FETCH是最慢的。

从搜索的精确度:

DFS要比非DFS的准确度更高。

antdesign select支持模糊搜索 antd search_搜索

ElasticSearch查询:

对应每个查询项,我们可以通过must、should、mustNot方法对QueryBuilder进行组合,形成多条件查询。(must => and, should=>or)

Luncene 支持基于词条的TermQuery、RangeQuery、PrefixQuery、BooleanQuery、PhraseQuery、WildcardQuery、FuzzQuery

  • TermQuery与QueryParser

        单个单词作为查询表达式时,它相当于一个单独的项,如果表达式是由单个单词构成,QueryParser的parse()方法会返回一个TermQuery对象。

如查询表达式为content:hello, QueryParser会返回一个域为content,值为hello的TermQuery。

Query query = new TermQuery("content", "hello")

  • RangeQuery与QUeryParser

QueryParser可以使用[ 起始 To 终止 ] 或 { 起始 To 终止 }表达式来构造RangeQuery。

如查询表达式:time:[20181010 To 20181210], QueryParser会返回一个域为time,下限为20181010,上限为20181210的RangeQuery。

Term t1 = new Term("time", "20181010");

Term t2 = new Term("time", "20181210");

Query query = new RangeQuery(t1, t2, true);

  • PrefixQuery与QueryParser

当查询表达式中短语以星号(*)结尾时,QueryParser会创建一个PrefixQuery对象。

如查询表达式为content:luc*, 则QueryParser会返回一个域为content,值为luc的PrefixQuery

Query query = new PrefixQuery(luc);

  • BooleanQuery与QueryParser

当查询表达式中包含多个项时,QueryParser可以方便的构建BooleanQuery。QueryParser使用圆括号分组,通过-,+,AND,OR及NOT来指定所生成的BooleanQuery。

  • PhraseQuery与QueryParser

在QueryParser的分析表达式中双引号的若干项会被转换为一个PhraseQuery对象,默认情况下,Slop因子为0,可以在表达式中通过~n来指定slop因子的值。

如查询表达式为content:"hello world" ~3, 则QueryParser会返回一个域为content,内容为"hello world", slope为3的短语查询。

Query query = new PhraseQuery();

query.setSlop(3);

query.add(new Term("content", "hello"));

query.add(new Term("content", "world"));

  • Wildcard与QueryParser

Luncene使用两个标准的通配符号,*代表0或多个字母,?代表0或1个字母。但查询表达式中包含*或者?时,则QueryParser会返回一个WildcardQuery对象。但要注意的是,当*出现在查询表达式的末尾时,会被优化为PrefixQuery;并且查询表达式的首个字符不能是通配符,防止用户输入以通配符*为前缀的搜索表达式,导致Lucene枚举所有的项而耗费巨大的资源。

  • FuzzyQuery和QueryParser

QueryParser通过在某个项之后添加"~"来支持FuzzyQuery类的模糊查询。

代码实现:

  • 简单查询及显示所有内容:

    @Test
    public void testQuery1(){
        SearchResponse response = client
                .prepareSearch(indices) // 指定要检索的索引库
                /**
                 * 设置检索方式:
                 *  QUERY_AND_FETCH:  在5.3之前,之后受保护
                 *  QUERY_THEN_FETCH:  默认
                 *  DFS_QUERY_AND_FETCH:  直接移除,新版本没有
                 *  DFS_QUERY_THEN_FETCH:
                 */
                .setSearchType(SearchType.DEFAULT)
                /**
                 * 设置要检索的内容
                 * 基于不同的检索方式,是否能够检索到想要的数据,就逐渐衍生出来了一个职位SEO,搜索引擎优化
                 */
//                .setQuery(QueryBuilders.matchPhrasePrefixQuery("firstname", "V*")) // 在firstname字段上检索以V开头的数据
//                .setQuery(QueryBuilders.matchQuery("state", "NM"))
                .setQuery(QueryBuilders.termQuery("age", 40))
                //分页,每页显示M条,显示第N页的数据setFrom((N - 1) * M ).setSize()
                .setFrom(1)//从哪一条开始显示
                .setSize(5)//每页显示的内容
                .get();
        // 返回检索结果数据,被封装SearchHits对象中
        SearchHits searchHits = response.getHits();
        long totalHits = searchHits.totalHits;
        System.out.println("搜索到"+totalHits+"个结果");

        /**
         * "hits": [
         * {
         * "_index": "product",
         * "_type": "bigdata",
         * "_id": "5",
         * "_score": 1,
         * "_source": {
         * "name": "redis",
         * "author": "redis",
         * "version": "5.0.0"
         * }
         * }
         */
        SearchHit[] hits = searchHits.getHits();
        for (SearchHit hit : hits){
            System.out.println("--------------------------------------------");
            String index = hit.getIndex();
            String type = hit.getType();
            String id = hit.getId();
            float score = hit.getScore();
            System.out.println("index: " + index);
            System.out.println("type: " + type);
            System.out.println("id: " + id);
            System.out.println("score: " + score);
            Map<String, Object> source = hit.getSourceAsMap();
            source.forEach((field, value) ->{
                System.out.println(field + "--->" + value);
            });

        }
    }
  • 查询字段部分高亮显示:

 @Test
    public void testHightLight(){
        SearchResponse response = client
                .prepareSearch(indices) // 指定要检索的索引库
                .setSearchType(SearchType.DEFAULT)
                .setQuery(QueryBuilders.matchQuery("address", "Avenue"))
                .highlighter(//设置高亮显示
                        SearchSourceBuilder.highlight()
                                .field("address")
                                .preTags("<font color='red' size='16px'>")
                                .postTags("</font>")
                )
                .setFrom(0)//从哪一条开始显示
                .setSize(5)//每页显示的内容
                .get();
        // 返回检索结果数据,被封装SearchHits对象中
        SearchHits searchHits = response.getHits();
        long totalHits = searchHits.totalHits;
        System.out.println("搜索到"+totalHits+"个结果");
        SearchHit[] hits = searchHits.getHits();
        for (SearchHit hit : hits){//获取高亮显示的内容
            System.out.println("-------------------------------------------");
            //高亮字段内容
            Map<String, HighlightField> highlightFields = hit.getHighlightFields();
            highlightFields.forEach((key,highlightField) -> {
                System.out.println("key: " + key);
                String address = "";
                Text[] fragments = highlightField.fragments();
                for (Text fragment : fragments){
                    address += fragment.toString();
                }
                System.out.println("address: " + address);

            });


        }

    }
  • 按照某个字段进行排序显示:

  @Test
    public void testSort(){
        SearchResponse response = client
                .prepareSearch(indices) // 指定要检索的索引库
                .setSearchType(SearchType.DEFAULT)
                .setQuery(QueryBuilders.matchQuery("address", "Avenue"))
                .highlighter(//设置高亮显示
                        SearchSourceBuilder.highlight()
                                .field("address")
                                .preTags("<font color='red' size='16px'>")
                                .postTags("</font>")
                )
                .addSort("age", SortOrder.ASC)
//                .addSort("age", SortOrder.DESC)
                .setFrom(0)//从哪一条开始显示
                .setSize(5)//每页显示的内容
                .get();
        // 返回检索结果数据,被封装SearchHits对象中
        SearchHits searchHits = response.getHits();
        long totalHits = searchHits.totalHits;
        System.out.println("搜索到"+totalHits+"个结果");
        SearchHit[] hits = searchHits.getHits();
        for (SearchHit hit : hits){//获取高亮显示的内容
            System.out.println("-------------------------------------------");
            Map<String, Object> source = hit.getSourceAsMap();
            Object firstname = source.get("firstname");
            Object age = source.get("age");
            System.out.println("firstname: " + firstname);
            System.out.println("age: " + age);
            //高亮字段内容
            Map<String, HighlightField> highlightFields = hit.getHighlightFields();
            highlightFields.forEach((key,highlightField) -> {
                System.out.println("key: " + key);
                String address = "";
                Text[] fragments = highlightField.fragments();
                for (Text fragment : fragments){
                    address += fragment.toString();
                }
                System.out.println("address: " + address);
            });
        }

    }
  • 聚合操作测试:

 @Test
    public void testAggr(){
        SearchResponse response = client
                .prepareSearch(indices) // 指定要检索的索引库
                .setSearchType(SearchType.DEFAULT)
                .setQuery(QueryBuilders.matchQuery("address", "Avenue"))
                .addAggregation(
                        AggregationBuilders
                                .avg("avg_age")//select avg(age) avg_age --> 这里面的name就是最好显示的别名
                                .field("age")//select max(age), min(age), avg(age) avg_age --> 这里面的field就是这里的age对应列,或者索引库中的field
                )
                .get();
        Aggregations aggrs = response.getAggregations();//是个集合
//        System.out.println(aggrs);
        for (Aggregation aggr : aggrs){
//            System.out.println(aggr);
//            System.out.println(aggr.getName());
//            System.out.println(aggr.getType());
            InternalAvg avg = (InternalAvg) aggr;
            double value = avg.getValue();
            System.out.println(avg.getName() + "-->" + value);
        }
    }
  • 过滤部分字段范围测试:

@Test
    public void testFilter(){
        SearchResponse response = client
                .prepareSearch(indices) // 指定要检索的索引库
                .setSearchType(SearchType.DEFAULT)
                .setQuery(QueryBuilders.matchQuery("address", "Avenue"))
                .highlighter(//设置高亮显示
                        SearchSourceBuilder.highlight()
                                .field("address")
                                .preTags("<font color='red' size='16px'>")
                                .postTags("</font>")
                )
                //过滤年龄在30~35之间的数据
                .setPostFilter(
                        QueryBuilders.rangeQuery("age").gte(30).lte(35)
                )
                .addSort("age", SortOrder.ASC)
                .setFrom(0)//从哪一条开始显示
                .setSize(5)//每页显示的内容
                .get();
        // 返回检索结果数据,被封装SearchHits对象中
        SearchHits searchHits = response.getHits();
        long totalHits = searchHits.totalHits;
        System.out.println("搜索到"+totalHits+"个结果");
        SearchHit[] hits = searchHits.getHits();
        for (SearchHit hit : hits){//获取高亮显示的内容
            System.out.println("-------------------------------------------");
            Map<String, Object> source = hit.getSourceAsMap();
            Object firstname = source.get("firstname");
            Object age = source.get("age");
            System.out.println("firstname: " + firstname);
            System.out.println("age: " + age);
            //高亮字段内容
            Map<String, HighlightField> highlightFields = hit.getHighlightFields();
            highlightFields.forEach((key,highlightField) -> {
                System.out.println("key: " + key);
                String address = "";
                Text[] fragments = highlightField.fragments();
                for (Text fragment : fragments){
                    address += fragment.toString();
                }
                System.out.println("address: " + address);
            });
        }
    }

 

全代码:

elasticsearch.conf:

cluster.name=rk-ES
cluster.host.port=hadoop01:9300,hadoop02:9300,hadoop03:9300

 

package rk.constants;

/**
 * @Author rk
 * @Date 2018/12/10 15:14
 * @Description:
 **/
public interface Constants {
    String CLUSTER_NAME = "cluster.name";
    String CLUSTER_HOST_PORT = "cluster.host.port";

}
package rk.elastic;

import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.search.SearchType;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.text.Text;
import org.elasticsearch.common.transport.TransportAddress;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.elasticsearch.search.aggregations.Aggregation;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.aggregations.Aggregations;
import org.elasticsearch.search.aggregations.metrics.avg.InternalAvg;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightField;
import org.elasticsearch.search.sort.SortOrder;
import org.elasticsearch.transport.client.PreBuiltTransportClient;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import rk.constants.Constants;

import java.io.IOException;
import java.io.InputStream;
import java.net.InetSocketAddress;
import java.util.Map;
import java.util.Properties;

/**
 * @Author rk
 * @Date 2018/12/10 15:06
 * @Description:
 **/
public class ElasticSearchTest2 {

    private TransportClient client;
    @Before
    public void setUp() throws IOException {
        Properties properties = new Properties();
        InputStream in = ElasticSearchTest2.class.getClassLoader().getResourceAsStream("elasticsearch.conf");
        properties.load(in);
        Settings setting = Settings.builder()
                .put(Constants.CLUSTER_NAME,properties.getProperty(Constants.CLUSTER_NAME))
                .build();
        client = new PreBuiltTransportClient(setting);
        String hostAndPorts = properties.getProperty(Constants.CLUSTER_HOST_PORT);
        for (String hostAndPort : hostAndPorts.split(",")){
            String[] fields = hostAndPort.split(":");
            String host = fields[0];
            int port = Integer.valueOf(fields[1]);
            TransportAddress ts = new TransportAddress(new InetSocketAddress(host, port));
            client.addTransportAddresses(ts);
        }
        System.out.println("cluster.name = " + client.settings().get("cluster.name"));
    }

    String[] indices = {"product","test"};

    @Test
    public void testQuery1(){
        SearchResponse response = client
                .prepareSearch(indices) // 指定要检索的索引库
                /**
                 * 设置检索方式:
                 *  QUERY_AND_FETCH:  在5.3之前,之后受保护
                 *  QUERY_THEN_FETCH:  默认
                 *  DFS_QUERY_AND_FETCH:  直接移除,新版本没有
                 *  DFS_QUERY_THEN_FETCH:
                 */
                .setSearchType(SearchType.DEFAULT)
                /**
                 * 设置要检索的内容
                 * 基于不同的检索方式,是否能够检索到想要的数据,就逐渐衍生出来了一个职位SEO,搜索引擎优化
                 */
//                .setQuery(QueryBuilders.matchPhrasePrefixQuery("firstname", "V*")) // 在firstname字段上检索以V开头的数据
//                .setQuery(QueryBuilders.matchQuery("state", "NM"))
                .setQuery(QueryBuilders.termQuery("age", 40))
                //分页,每页显示M条,显示第N页的数据setFrom((N - 1) * M ).setSize()
                .setFrom(1)//从哪一条开始显示
                .setSize(5)//每页显示的内容
                .get();
        // 返回检索结果数据,被封装SearchHits对象中
        SearchHits searchHits = response.getHits();
        long totalHits = searchHits.totalHits;
        System.out.println("搜索到"+totalHits+"个结果");

        /**
         * "hits": [
         * {
         * "_index": "product",
         * "_type": "bigdata",
         * "_id": "5",
         * "_score": 1,
         * "_source": {
         * "name": "redis",
         * "author": "redis",
         * "version": "5.0.0"
         * }
         * }
         */
        SearchHit[] hits = searchHits.getHits();
        for (SearchHit hit : hits){
            System.out.println("--------------------------------------------");
            String index = hit.getIndex();
            String type = hit.getType();
            String id = hit.getId();
            float score = hit.getScore();
            System.out.println("index: " + index);
            System.out.println("type: " + type);
            System.out.println("id: " + id);
            System.out.println("score: " + score);
            Map<String, Object> source = hit.getSourceAsMap();
            source.forEach((field, value) ->{
                System.out.println(field + "--->" + value);
            });

        }
    }

    @Test
    public void testHightLight(){
        SearchResponse response = client
                .prepareSearch(indices) // 指定要检索的索引库
                .setSearchType(SearchType.DEFAULT)
                .setQuery(QueryBuilders.matchQuery("address", "Avenue"))
                .highlighter(//设置高亮显示
                        SearchSourceBuilder.highlight()
                                .field("address")
                                .preTags("<font color='red' size='16px'>")
                                .postTags("</font>")
                )
                .setFrom(0)//从哪一条开始显示
                .setSize(5)//每页显示的内容
                .get();
        // 返回检索结果数据,被封装SearchHits对象中
        SearchHits searchHits = response.getHits();
        long totalHits = searchHits.totalHits;
        System.out.println("搜索到"+totalHits+"个结果");
        SearchHit[] hits = searchHits.getHits();
        for (SearchHit hit : hits){//获取高亮显示的内容
            System.out.println("-------------------------------------------");
            //高亮字段内容
            Map<String, HighlightField> highlightFields = hit.getHighlightFields();
            highlightFields.forEach((key,highlightField) -> {
                System.out.println("key: " + key);
                String address = "";
                Text[] fragments = highlightField.fragments();
                for (Text fragment : fragments){
                    address += fragment.toString();
                }
                System.out.println("address: " + address);

            });


        }

    }

    @Test
    public void testSort(){
        SearchResponse response = client
                .prepareSearch(indices) // 指定要检索的索引库
                .setSearchType(SearchType.DEFAULT)
                .setQuery(QueryBuilders.matchQuery("address", "Avenue"))
                .highlighter(//设置高亮显示
                        SearchSourceBuilder.highlight()
                                .field("address")
                                .preTags("<font color='red' size='16px'>")
                                .postTags("</font>")
                )
                .addSort("age", SortOrder.ASC)
//                .addSort("age", SortOrder.DESC)
                .setFrom(0)//从哪一条开始显示
                .setSize(5)//每页显示的内容
                .get();
        // 返回检索结果数据,被封装SearchHits对象中
        SearchHits searchHits = response.getHits();
        long totalHits = searchHits.totalHits;
        System.out.println("搜索到"+totalHits+"个结果");
        SearchHit[] hits = searchHits.getHits();
        for (SearchHit hit : hits){//获取高亮显示的内容
            System.out.println("-------------------------------------------");
            Map<String, Object> source = hit.getSourceAsMap();
            Object firstname = source.get("firstname");
            Object age = source.get("age");
            System.out.println("firstname: " + firstname);
            System.out.println("age: " + age);
            //高亮字段内容
            Map<String, HighlightField> highlightFields = hit.getHighlightFields();
            highlightFields.forEach((key,highlightField) -> {
                System.out.println("key: " + key);
                String address = "";
                Text[] fragments = highlightField.fragments();
                for (Text fragment : fragments){
                    address += fragment.toString();
                }
                System.out.println("address: " + address);
            });
        }

    }


    @Test
    public void testAggr(){
        SearchResponse response = client
                .prepareSearch(indices) // 指定要检索的索引库
                .setSearchType(SearchType.DEFAULT)
                .setQuery(QueryBuilders.matchQuery("address", "Avenue"))
                .addAggregation(
                        AggregationBuilders
                                .avg("avg_age")//select avg(age) avg_age --> 这里面的name就是最好显示的别名
                                .field("age")//select max(age), min(age), avg(age) avg_age --> 这里面的field就是这里的age对应列,或者索引库中的field
                )
                .get();
        Aggregations aggrs = response.getAggregations();//是个集合
//        System.out.println(aggrs);
        for (Aggregation aggr : aggrs){
//            System.out.println(aggr);
//            System.out.println(aggr.getName());
//            System.out.println(aggr.getType());
            InternalAvg avg = (InternalAvg) aggr;
            double value = avg.getValue();
            System.out.println(avg.getName() + "-->" + value);
        }
    }

    @Test
    public void testFilter(){
        SearchResponse response = client
                .prepareSearch(indices) // 指定要检索的索引库
                .setSearchType(SearchType.DEFAULT)
                .setQuery(QueryBuilders.matchQuery("address", "Avenue"))
                .highlighter(//设置高亮显示
                        SearchSourceBuilder.highlight()
                                .field("address")
                                .preTags("<font color='red' size='16px'>")
                                .postTags("</font>")
                )
                //过滤年龄在30~35之间的数据
                .setPostFilter(
                        QueryBuilders.rangeQuery("age").gte(30).lte(35)
                )
                .addSort("age", SortOrder.ASC)
                .setFrom(0)//从哪一条开始显示
                .setSize(5)//每页显示的内容
                .get();
        // 返回检索结果数据,被封装SearchHits对象中
        SearchHits searchHits = response.getHits();
        long totalHits = searchHits.totalHits;
        System.out.println("搜索到"+totalHits+"个结果");
        SearchHit[] hits = searchHits.getHits();
        for (SearchHit hit : hits){//获取高亮显示的内容
            System.out.println("-------------------------------------------");
            Map<String, Object> source = hit.getSourceAsMap();
            Object firstname = source.get("firstname");
            Object age = source.get("age");
            System.out.println("firstname: " + firstname);
            System.out.println("age: " + age);
            //高亮字段内容
            Map<String, HighlightField> highlightFields = hit.getHighlightFields();
            highlightFields.forEach((key,highlightField) -> {
                System.out.println("key: " + key);
                String address = "";
                Text[] fragments = highlightField.fragments();
                for (Text fragment : fragments){
                    address += fragment.toString();
                }
                System.out.println("address: " + address);
            });
        } 
    }
   
    
    @After
    public void cleanUp(){
        client.close();
    }

}

elasticsearch2.3版本时的代码实现:

package rk.elastic;


import com.fasterxml.jackson.databind.ObjectMapper;
import org.dom4j.Document;
import org.dom4j.Element;
import org.dom4j.io.SAXReader;
import org.elasticsearch.action.bulk.BulkRequestBuilder;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.search.SearchRequestBuilder;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.search.SearchType;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.text.Text;
import org.elasticsearch.common.transport.TransportAddress;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightField;
import org.elasticsearch.transport.client.PreBuiltTransportClient;
import org.junit.Before;
import org.junit.Test;
import rk.constants.Constants;

import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.net.InetSocketAddress;
import java.util.*;

/**
 * @Author rk
 * @Date 2018/12/10 15:06
 * @Description:
 *
 *      准备数据:
 *      <doc>
 *         <url>http://gongyi.sohu.com/20120730/n349358066.shtml</url>
 *         <docno>fdaa73d52fd2f0ea-34913306c0bb3300</docno>
 *          <contenttitle>失独父母中年遇独子夭折 称不怕死亡怕养老生病</contenttitle>
 *          <content></content>
 *      </doc>
 *
 **/

class Article{
    private String url;
    private String docno;
    private String content;
    private String contenttitle;

    public Article() {
    }

    public Article(String url, String docno, String content, String contenttitle) {
        this.url = url;
        this.docno = docno;
        this.content = content;
        this.contenttitle = contenttitle;
    }

    public String getUrl() {
        return url;
    }

    public void setUrl(String url) {
        this.url = url;
    }

    public String getDocno() {
        return docno;
    }

    public void setDocno(String docno) {
        this.docno = docno;
    }

    public String getContent() {
        return content;
    }

    public void setContent(String content) {
        this.content = content;
    }

    public String getContenttitle() {
        return contenttitle;
    }

    public void setContenttitle(String contenttitle) {
        this.contenttitle = contenttitle;
    }

    @Override
    public String toString() {
        return "Article{" +
                "url='" + url + '\'' +
                ", docno='" + docno + '\'' +
                ", content='" + content + '\'' +
                ", contenttitle='" + contenttitle + '\'' +
                '}';
    }
}

//  解析代码,取其中前20条
class XmlParser {
    public static List<Article> getArticle() {
        List<Article> list = new ArrayList<Article>();
        SAXReader reader = new SAXReader();
        Document document;
        try {
            document = reader.read(new File("news_sohusite_xml"));
            Element root = document.getRootElement();
            Iterator<Element> iterator = root.elementIterator("doc");
            Article article = null;
            int count = 0;
            while(iterator.hasNext()) {
                Element doc = iterator.next();
                String url = doc.elementTextTrim("url");
                String docno = doc.elementTextTrim("docno");
                String content = doc.elementTextTrim("content");
                String contenttitle = doc.elementTextTrim("contenttitle");
                article = new Article();
                article.setContent(content);
                article.setDocno(docno);
                article.setContenttitle(contenttitle);
                article.setUrl(url);
                if(++count > 20) {
                    break;
                }
                list.add(article);
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
        return list;
    }

}





public class ElasticSearchTest2_3 {

    private TransportClient client;
    @Before
    public void setUp() throws IOException {
        Properties properties = new Properties();
        InputStream in = ElasticSearchTest2.class.getClassLoader().getResourceAsStream("elasticsearch.conf");
        properties.load(in);
        Settings setting = Settings.builder()
                .put(Constants.CLUSTER_NAME,properties.getProperty(Constants.CLUSTER_NAME))
                .build();
        client = new PreBuiltTransportClient(setting);
        String hostAndPorts = properties.getProperty(Constants.CLUSTER_HOST_PORT);
        System.out.println(hostAndPorts);
        for (String hostAndPort : hostAndPorts.split(",")){
            String[] fields = hostAndPort.split(":");
            String host = fields[0];
            int port = Integer.valueOf(fields[1]);
            TransportAddress ts = new TransportAddress(new InetSocketAddress(host, port));
            client.addTransportAddresses(ts);
        }

        TransportAddress[] transportAddrs = new TransportAddress[3];
        client.addTransportAddresses(transportAddrs);
        System.out.println("cluster.name = " + client.settings().get("cluseter.name"));

    }

    String index = "search";
    //  批量导入ES库
    @Test
    public void bulkInsert() throws Exception {
        List<Article> list = XmlParser.getArticle();
        ObjectMapper oMapper = new ObjectMapper();
        BulkRequestBuilder bulkRequestBuilder = client.prepareBulk();
        for (int i = 0; i < list.size(); i++) {
            Article article = list.get(i);
            String val = oMapper.writeValueAsString(article);
            bulkRequestBuilder.add(new IndexRequest(index, "news",
                    article.getDocno()).source(val));
        }
        BulkResponse response = bulkRequestBuilder.get();
    }

    //查询
    @Test
    public void testSearch() {
        String indices = "bigdata";//指的是要搜索的哪一个索引库
        SearchRequestBuilder builder = client.prepareSearch(indices)
                .setSearchType(SearchType.DEFAULT)
                .setFrom(0)
                .setSize(5)//设置分页
                /**
                 * 这是最新的
                 * .highlighter(//设置高亮显示
                 *          SearchSourceBuilder.highlight()
                 *                      .field("address")
                 *                      .preTags("<font color='red' size='16px'>")
                 *                      .postTags("</font>")
                 *      )
                 */
                .addHighlightedField("name")//设置高亮字段
                .setHighlighterPreTags("<font color='blue'>")
                .setHighlighterPostTags("</font>");//高亮风格
        builder.setQuery( QueryBuilders.fuzzyQuery("name", "hadoop"));
        SearchResponse searchResponse = builder.get();
        SearchHits searchHits = searchResponse.getHits();
        SearchHit[] hits = searchHits.getHits();
        long total = searchHits.getTotalHits();
        System.out.println("总共条数:" + total);//总共查询到多少条数据
        for (SearchHit searchHit : hits) {

            Map<String, Object> source = searchHit.getSource();//这是最新的:searchHit.getSourceAsMap()
            Map<String, HighlightField> highlightFields = searchHit.getHighlightFields();
            System.out.println("---------------------------");
            String name = source.get("name").toString();
            String author = source.get("author").toString();
            System.out.println("name=" + name);
            System.out.println("author=" + author);
            HighlightField highlightField = highlightFields.get("name");
            if(highlightField != null) {
                Text[] fragments = highlightField.fragments();
                name = "";
                for (Text text : fragments) {
                    name += text.toString();
                }
            }
            System.out.println("name: " + name);
            System.out.println("author: " + author);
        }
    }

}

与SQL使用LIMIT来控制单“页”数量类似,Elasticsearch使用的是from以及size两个参数:

     from:从哪条结果开始,默认值为0

     size:每次返回多少个结果,默认值为10

假设每页显示5条结果,那么1至3页的请求就是:

    GET /_search?size=5

    GET /_search?size=5&from=5

    GET /_search?size=5&from=10

注意:不要一次请求过多或者页码过大的结果,这么会对服务器造成很大的压力。因为它们会在返回前排序。一个请求会经过多个分片。每个分片都会生成自己的排序结果。然后再进行集中整理,以确保最终结果的正确性