springboot 如何匹配路由规则

转载

mob64ca140530fb 2024-09-10 14:40:23

文章标签 springboot 如何匹配路由规则 java springboot lucene solr 文章分类 架构后端开发

Lucene的学习第四篇——入门代码

需求：

通过关键字搜索文件，凡是文件名或文件内容包括关键字的文件都需要找出来：下图（是一堆文件列表）

springboot 如何匹配路由规则_java

本人使用版本与环境：

lucene4.10.2

Jdk：1.8（Jdk要求：1.7以上）

SpringBoot：2.1.3

IDE：IntelliJ IDEA

Pom.xml

<dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-core</artifactId>
            <version>4.10.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-analyzers-common</artifactId>
            <version>4.10.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-queryparser</artifactId>
            <version>4.10.2</version>
        </dependency>
        <dependency>
            <groupId>com.janeluo</groupId>
            <artifactId>ikanalyzer</artifactId>
            <version>2012_u6</version>
        </dependency>

        <!--中文分词器-->
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-analyzers-smartcn</artifactId>
            <version>7.6.0</version>
        </dependency>
        <!--文件IO操作-->
        <dependency>
            <groupId>commons-io</groupId>
            <artifactId>commons-io</artifactId>
            <version>2.6</version>
        </dependency>

代码

package com.example.test;

import ch.qos.logback.core.net.SyslogOutputStream;
import org.apache.commons.io.FileUtils;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.*;
import org.apache.lucene.index.*;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
import org.junit.Test;

import java.io.File;

public class FileTest {
    /**
     * 创建索引
     * @throws Exception
     */
    @Test
    public void createIndex() throws Exception{
        //索引库存放的位置，也可以放在硬盘
        Directory directory= FSDirectory.open(new File("./index"));
        //标准的分词器
        Analyzer analyzer =new StandardAnalyzer();
        //创建输出流write
        IndexWriterConfig config =new IndexWriterConfig(Version.LUCENE_4_10_2,analyzer);
        IndexWriter indexWriter = new IndexWriter(directory,config);


        //创建Filed域
        File f=new File("F:\\a");
        //找到下面的所有待搜索的文件
        File[] listFiles=f.listFiles();
        for (File file:listFiles){
            //创建文档对象
            Document document=new Document();
            //文件名称
            String file_name=file.getName();
            Field fileNameFiled=new TextField("fileName",file_name, Field.Store.YES);
            //文件大小
            long file_size= FileUtils.sizeOf(file);
            Field fileSizeField=new LongField("fileSize",file_size,Field.Store.YES);
            //文件路径
            String file_path=file.getPath();
            Field filePathField=new StoredField("filePath",file_path);
            //文件内容
            String file_content = FileUtils.readFileToString(file,"utf8");
            Field fileContentField=new TextField("fileContent",file_content, Field.Store.YES);

            //保存到文件对象里
            document.add(fileNameFiled);
            document.add(fileSizeField);
            document.add(filePathField);
            document.add(fileContentField);

            //写到索引库
            indexWriter.addDocument(document);
        }
        //关闭
        indexWriter.close();
    }

    /**
     * 查询索引
     * @throws Exception
     */
    @Test
    public void searchIndex() throws Exception{
        //第一步，查询准备工作，创建Directory对象
        Directory dir = FSDirectory.open(new File("./index"));
        //创建IndexReader对象
        IndexReader reader= DirectoryReader.open(dir);
        //创建IndexSearch对象
        IndexSearcher search =new IndexSearcher(reader);

        //第二步，闯将查询条件对象
        TermQuery query=new TermQuery(new Term("fileContent","what"));
        //第三步：执行查询，参数（1：查询条件对象，2：查询结果返回的最大值）
        TopDocs topDocs=search.search(query,10);
        //第四步：处理查询结果
        //输出结果数量
        System.out.print("查询结果数量："+topDocs.totalHits);
        //取得结果集
        ScoreDoc[] scoreDocs=topDocs.scoreDocs;
        for (ScoreDoc scoreDoc:scoreDocs){
            System.out.println("当前doc得分:"+scoreDoc.score);
            //根据文档对象ID取得文档对象
            Document doc=search.doc(scoreDoc.doc);
            System.out.println("文件名称："+doc.get("fileName"));
            System.out.println("文件路径："+doc.get("filePath"));
            System.out.println("文件大小："+doc.get("fileSize"));
            System.out.println("=======================================");
        }
        //关闭IndexReader对象
        reader.close();
    }
}

searchIndex()方法运行后出现类似的索引库，则表示成功

springboot 如何匹配路由规则_springboot 如何匹配路由规则_02

searchIndex执行相应的搜索条件之后：

springboot 如何匹配路由规则_solr_03

通过以上的两段代码我们实现了创建索引与查询索引。
第一段代码做了这么几个事：
将我们要查询的每个文档，构建了了文档对象。文档对象里面存放的就是该文档的信息。（文件名，大小，内容，路径等）
将该文档对象扔进索引库（自动创建了索引）
索引库存放在./index 目录下

第二段代码：
就是到索引库的目录下找fileContent里面有：whatt的文档。然后输出了该文档的信息。
更换查询条件，如查询名称为aaabbb.txt,aaabbb,汪浩斌.txt的文档，再去看上一篇文章开篇的疑问

中文分词器：
我们还是面临一个问题：
如何通过“全文” 搜到我们想要的“全文检索.txt”文档？
我们通过lukeall查看索引，找到了原因。那就是没有正确的分词，是因为我们在代码中使用的是官方推荐的标准分词器，而这个分词器，是老外的，不能对中文进行分词，所以我们要使用中文分词器。而现在lucene的中文分词器：CJK词器，smartChinese分词器。
CJK分词器：是二分法：举例：我爱写代码：分成：我爱，爱写，写代，代码。
smartChinese：扩展性不太好，
市场用的有:庖丁解牛，mmseg4j。但是这两个作者多年没有更新了。这里主要介绍IK 分词器。
这里仅仅介绍IK分词器的使用：

<dependency>
            <groupId>com.janeluo</groupId>
            <artifactId>ikanalyzer</artifactId>
            <version>2012_u6</version>
 </dependency>

之前的代码里用的是标准分词器，老外的，不支持中文分词，下面换Ik分词器

//标准的分词器
        //Analyzer analyzer =new StandardAnalyzer();
        //下面替换为ik分词器
        Analyzer analyzer =new IKAnalyzer();

再执行查询方法，可以看到中文查询条件也可以的到结果

springboot 如何匹配路由规则_springboot_04

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：YMODEM是用的什么CRC

下一篇：清华大学镜像 pytorch

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯