6.1 高亮介绍
许多应用都倾向于在每个搜索结果中 高亮 显示搜索的关键词,比如字体的加粗,改变字体的颜色等.以便让用户知道为何该文档符合查询条件。在 Elasticsearch 中检索出高亮片段也很容易。
高亮显示需要一个字段的实际内容。 如果该字段没有被存储(映射mapping没有将存储设置为 true),则加载实际的_source,并从_source中提取相关的字段。
以百度搜索“java”为例,如下图标红的文字。
在 Elasticsearch 中检索出高亮片段也很容易。
再次执行前面的查询,并增加一个新的 highlight
参数:
GET /megacorp/employee/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
},
"highlight": {
"fields" : {
"about" : {}
}
}
}
当执行该查询时,返回结果与之前一样,与此同时结果中还多了一个叫做 highlight
的部分。这个部分包含了 about
属性匹配的文本片段,并以 HTML 标签 <em></em>
封装:
{
...
"hits": {
"total": 1,
"max_score": 0.23013961,
"hits": [
{
...
"_score": 0.23013961,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [ "sports", "music" ]
},
"highlight": {
"about": [
"I love to go <em>rock</em> <em>climbing</em>"
]
}
}
]
}
}
可以看到,除了从ES中得到标准信息以外,还有一个新命名的highlight的部分。这里ES使用的这HTML标签作为highlight部分的开始,使用<//em>标识其结束。
6.2 高亮使用
highlight参数:
下面的参数可以改变返回的结果。即可以为单独的字段设置不同的参数,也可以作为 highlight 的属性统一定义。
number_of_fragments
fragment 是指一段连续的文字。返回结果最多可以包含几段不连续的文字。默认是5。
fragment_size
某字段的值,长度是1万,但是我们一般不会在页面展示这么长,可能只是展示一部分。设置要显示出来的fragment文本判断的长度,默认是100
noMatchSize
搜索出来的这个文档这个字段已经显示出高亮的情况,可是其它字段并没有任何显示,设置这个属性可以显示出来。
pre_tags
标记 highlight 的开始标签。例如上面的。
post_tags
标记 highlight 的结束标签。例如上面的。
例如:以“中国苹果”为关键词搜索短信下发内容,并高亮显示,HTML样式为"".
JAVA代码示例:com.javablog.elasticsearch.query.impl.HightLightQueryImpl
@Override
public void hightLightQuery(String indexName, String type,String field,String keyword) throws IOException {
SearchRequest searchRequest = new SearchRequest(indexName);
searchRequest.types(type);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.from(0);
searchSourceBuilder.size(5);
//条件
MatchQueryBuilder queryBuilder = new MatchQueryBuilder(field,keyword);
// 高亮设置
HighlightBuilder highlightBuilder = new HighlightBuilder();
highlightBuilder.requireFieldMatch(false).field(field).
preTags("<b><font color=red>").postTags("</font></b>");
searchSourceBuilder.highlighter(highlightBuilder);
searchSourceBuilder.query( queryBuilder);
searchRequest.source(searchSourceBuilder);
log.info("source string:" + searchRequest.source());
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
SearchHits hits = searchResponse.getHits();
System.out.println("count:"+hits.totalHits);
SearchHit[] h = hits.getHits();
for (SearchHit hit : h) {
//得到高亮显示的集合
Map<String, HighlightField> map = hit.getHighlightFields();
HighlightField highlightField = map.get(field);
// System.out.println("高"+map);
if (highlightField!=null){
System.out.println(highlightField.getName());
Text[] texts = highlightField.getFragments();
System.out.println("高亮显示结果"+texts[0]);
}
System.out.println("普通字段结果"+hit.getSourceAsMap());
}
}
演示用例:com.javablog.elasticsearch.test.document.HightLightQueryTest
@Test
public void testhightLightQuery() throws IOException {
hightLightQuery.hightLightQuery(indexName,type,"smsContent","中国苹果");
}
把查询结果另存为HTML文件,用浏览器打开结果显示如下:
第二种情况:将fragmentSize设置为10
@Override
public void hightLightQueryByFragment(String indexName, String type, int fragmentSize) throws IOException {
SearchRequest searchRequest = new SearchRequest(indexName);
searchRequest.types(type);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.from(0);
searchSourceBuilder.size(5);
//条件
MatchQueryBuilder queryBuilder = new MatchQueryBuilder("smsContent","企业");
// 高亮设置
HighlightBuilder highlightBuilder = new HighlightBuilder();
highlightBuilder.requireFieldMatch(false).field("smsContent").
preTags("<b><em style='color:red;'>").postTags("</em></b>");
highlightBuilder.fragmentSize(fragmentSize);
searchSourceBuilder.highlighter(highlightBuilder);
searchSourceBuilder.query( queryBuilder);
searchRequest.source(searchSourceBuilder);
log.info("source string:" + searchRequest.source());
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
SearchHits hits = searchResponse.getHits();
System.out.println("count:"+hits.totalHits);
SearchHit[] h = hits.getHits();
for (SearchHit hit : h) {
//得到高亮显示的集合
Map<String, HighlightField> map = hit.getHighlightFields();
HighlightField highlightField = map.get("smsContent");
if (highlightField!=null){
Text[] fragments3 = highlightField.getFragments();
for (Text text : fragments3) {
System.out.println("result:"+text);
}
}
}
}
演示用例:com.javablog.elasticsearch.test.document.HightLightQueryTest
@Test
public void testHightLightQueryByFragment() throws IOException, InterruptedException {
//fragment-size 指定高亮字段最大字符长度
SmsSendLog smsSendLog5 = new SmsSendLog();
smsSendLog5.setMobile("13600000088");
smsSendLog5.setCorpName("中国移动");
smsSendLog5.setCreateDate(new Date());
smsSendLog5.setSendDate(new Date());
smsSendLog5.setIpAddr("10.126.2.8");
smsSendLog5.setLongCode("10690000998");
smsSendLog5.setReplyTotal(60);
smsSendLog5.setProvince("湖北省");
smsSendLog5.setOperatorId(1);
smsSendLog5.setSmsContent("新京报快讯 7月16日,国家发改委网站发布《关于印发的通知》(下称《通知》),该《通知》表示,要完善国有企业退出机制。推动国有“僵尸企业”破产退出。对符合破产等退出条件的国有企业,各相关方不得以任何方式阻碍其退出,防止形成“僵尸企业”。不得通过违规提供政府补贴、贷款等方式维系“僵尸企业”生存,有效解决国有“僵尸企业”不愿退出的问题。国有企业退出时,金融机构等债权人不得要求政府承担超出出资额之外的债务清偿责任。《通知》还称,完善特殊类型国有企业退出制度。针对全民所有制企业、厂办集体企业存在的出资人已注销、工商登记出资人与实际控制人不符、账务账册资料严重缺失等问题,明确市场退出相关规定,加快推动符合条件企业退出市场,必要时通过强制清算等方式实行强制退出。\n" +
"\n" +
"《通知》表示,要建立市场主体退出预警机制。强化企业信息披露义务。提高企业财务和经营信息透明度,强化信息披露义务主体对信息披露真实性、准确性、完整性的责任要求。公众公司应依法向公众披露财务和经营信息。非公众公司应及时向股东和债权人披露财务和经营信息。鼓励非公众公司特别是大型企业集团、国有企业参照公众公司要求公开相关信息。强化企业在陷入财务困境时及时向股东、债权人等利益相关方的信息披露义务。");
docService.add(indexName,type, JSON.toJSONString(smsSendLog5),"220");
Thread.sleep(2000);
hightLightQuery.hightLightQueryByFragment(indexName,type,20);
}
搜索结果:
第三种情况:把numOfFragments设置为1
@Override
public void hightLightQueryByNumOfFragments(String indexName, String type, int fragmentSize,int numOfFragments) throws IOException {
SearchRequest searchRequest = new SearchRequest(indexName);
searchRequest.types(type);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.from(0);
searchSourceBuilder.size(5);
//条件
MatchQueryBuilder queryBuilder = new MatchQueryBuilder("smsContent","企业");
// 高亮设置
HighlightBuilder highlightBuilder = new HighlightBuilder();
highlightBuilder.requireFieldMatch(false).field("smsContent").
preTags("<b><em style='color:red;'>").postTags("</em></b>");
highlightBuilder.fragmentSize(fragmentSize);
highlightBuilder.numOfFragments(numOfFragments);
searchSourceBuilder.highlighter(highlightBuilder);
searchSourceBuilder.query( queryBuilder);
searchRequest.source(searchSourceBuilder);
log.info("source string:" + searchRequest.source());
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
SearchHits hits = searchResponse.getHits();
System.out.println("count:"+hits.totalHits);
SearchHit[] h = hits.getHits();
for (SearchHit hit : h) {
//得到高亮显示的集合
Map<String, HighlightField> map = hit.getHighlightFields();
HighlightField highlightField = map.get("smsContent");
if (highlightField!=null){
Text[] fragments3 = highlightField.getFragments();
for (Text text : fragments3) {
System.out.println("高亮显示结果:"+text);
}
}
}
}
演示用例:com.javablog.elasticsearch.test.document.HightLightQueryTest
@Test
public void testHightLightQueryByNumOfFragments() throws IOException {
hightLightQuery.hightLightQueryByNumOfFragments(indexName,type,20,2);
}
搜索结果:
第四种情况:将noMatchSize设置进来
@Override
public void hightLightNoMatchSize(String indexName, String type, int fragmentSize,int numOfFragments,int noMatchSize) throws IOException {
SearchRequest searchRequest = new SearchRequest(indexName);
searchRequest.types(type);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.from(0);
searchSourceBuilder.size(5);
//条件
BoolQueryBuilder builder = QueryBuilders.boolQuery();
MatchQueryBuilder querySmsContentBuilder = new MatchQueryBuilder("smsContent","企业");
MatchQueryBuilder queryCorpNameBuilder = new MatchQueryBuilder("corpName","企业");
builder.should(querySmsContentBuilder);
builder.should(queryCorpNameBuilder);
// 高亮设置
HighlightBuilder highlightBuilder = new HighlightBuilder();
highlightBuilder.requireFieldMatch(false).field("smsContent").field("corpName").
preTags("<b><em style='color:red;'>").postTags("</em></b>");
highlightBuilder.fragmentSize(fragmentSize);
highlightBuilder.numOfFragments(numOfFragments);
highlightBuilder.noMatchSize(noMatchSize);
searchSourceBuilder.highlighter(highlightBuilder);
searchSourceBuilder.query(builder);
searchRequest.source(searchSourceBuilder);
log.info("source string:" + searchRequest.source());
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
SearchHits hits = searchResponse.getHits();
System.out.println("count:"+hits.totalHits);
SearchHit[] h = hits.getHits();
for (SearchHit hit : h) {
//得到高亮显示的集合
Map<String, HighlightField> map = hit.getHighlightFields();
HighlightField highlightField1 = map.get("corpName");
if(highlightField1!=null) {
Text[] fragments1 = highlightField1.getFragments();
for (Text text : fragments1) {
System.out.println("1:" + text);
}
}
HighlightField highlightField2 = map.get("smsContent");
if (highlightField2!=null){
Text[] fragments3 = highlightField2.getFragments();
for (Text text : fragments3) {
System.out.println("2:"+text);
}
}
}
}
演示用例:com.javablog.elasticsearch.test.document.HightLightQueryTest
@Test
public void testHightLightNoMatchSize() throws IOException {
hightLightQuery.hightLightNoMatchSize(indexName,type,30,2,150);
}
搜索结果