java实现docx转doc java将docx转html

转载

mob64ca140530fb 2023-11-10 20:58:49

文章标签 java实现docx转doc java 开发语言后端 html 文章分类 Java 后端开发

Java使用POI将doc文档转为Html

前提
几个使用到的类

几个方法的理解
converter.setPicturesManager(xxxx)
converter.processDocument(hwpfDocument);
外观模式

具体实现
结果

测试
生成结果

Word内容
生成的目录以及结果
Html

参考资料

前提

关于依赖什么的请看上一篇文章：docx转Html

几个使用到的类

HWPFDocument ：代表了一个doc文件对象
WordToHtmlConverter ：看名字就知道了用于Word转Html的类
Document : 表示一个完整的Html或者XML文档对象
DOMSource ：源树
StreamResult : 转换结果的持有者
Transformer ：转换器用于将源树转为结果树

吐槽一下POI类的注释，是真的少，还好命名都规范源码还是能看个大概 = =

几个方法的理解

converter.setPicturesManager(xxxx)

这个用的是匿名内部类

converter.processDocument(hwpfDocument);

就是一个简单的塞值，将hwpfDocument内容属性塞到转换器内部的HtmlDocumentFacade中

外观模式

外观模式，这个一句话说不清楚，我就放在最下面了链接里了。上面的HtmlDocumentFacade就是使用了外观模式。

具体实现

因为该方法作用和之前提到的类似，所以其中的工具类的方法大家去上一篇自取即可：docx转Html

/*
     * @description 将doc文档转为html
     * @author 三文鱼
     * @date 9:16 2022/4/29
     * @param filePath
     * @param htmlPath
     * @return void
     **/
    public static void docToHtml(String filePath , String htmlPath) throws Exception {
        //获取文件名称
        String myFileName = getFileNameInfo(filePath , 0);

        //该doc文件转换后所有文件存放的目录
        String docRootPath = htmlPath + File.separator + myFileName + getDataTime() + File.separator;
        String imagePath = docRootPath  + "image" + File.separator;
        //转换的html文件路径 与图片在同目录中
        String fileOutName = docRootPath + myFileName + ".html";
        
        //创建图片文件的存储目录
        new File(imagePath).mkdirs();
        //poi中doc文档对应的实体类
        HWPFDocument hwpfDocument = new HWPFDocument(new FileInputStream(filePath));
        //使用空的文档对象构建一个转换对象
        WordToHtmlConverter converter = new WordToHtmlConverter(DocumentBuilderFactory
                                                                    .newInstance()
                                                                    .newDocumentBuilder()
                                                                    .newDocument());

        //设置存储图片的管理者--使用匿名内部类实现 该类实现了PicturesManager接口，实现了其中的savePicture方法
        converter.setPicturesManager(new PicturesManager() {
            FileOutputStream out = null;
            //在下面的processDocument方法内部会调用该方法 用于存储word中的图片文件
            @Override
            public String savePicture(byte[] bytes, PictureType pictureType, String name, float width, float height) {
                try {
                    //单个照片的保存
                    out = new FileOutputStream(imagePath + name);
                    out.write(bytes);
                } catch (IOException exception) {
                    exception.printStackTrace();
                }finally {
                    if(out != null) {
                        try {
                            out.close();
                        } catch (IOException e) {
                            e.printStackTrace();
                        }
                    }
                }
                //这里要返回给操作者（HtmlDocumentFacade）一个存储的路径 用于生成Html时定位到图片资源
                return imagePath + name;
            }
        });
        //使用外观模式，将hwpfDocument文档对象设置给HtmlDocumentFacade中的Document属性
        converter.processDocument(hwpfDocument);
        //获取转换器中的document文档
        Document htmlDocument = converter.getDocument();
        //充当文档对象模型 （DOM） 树形式的转换源树的持有者  -- 源树
        DOMSource domSource = new DOMSource(htmlDocument);

        //转换器 该对象用于将源树转换为结果树
        Transformer transformer = TransformerFactory.newInstance().newTransformer();
        //设置输出时的以什么方式输出，也可说是结果树的文件类型 可以是html/xml/text或者是一些扩展前三者的扩展类型
        transformer.setOutputProperty(OutputKeys.METHOD , "html");
        //设置一些必要的属性 设置输出时候的编码为utf-8
        transformer.setOutputProperty(OutputKeys.ENCODING , "utf-8");

        //转换 将输入的源树转换为结果树并且输出到streamResult中
        transformer.transform(domSource , new StreamResult(new File(fileOutName));
    }

结果

测试

测试的话就跟之前的代码一样了

public class DocTest {
    public static void main(String[] args) {
        String filePath = "F:\\学习记录\\测试数据\\word\\doc\\test.doc";
        String htmlPath = "F:\\学习记录\\测试数据\\word\\html";
        try {
            MyDocUtil.docToHtml(filePath, htmlPath);
        }catch (Exception exception) {
            exception.printStackTrace();
        }
    }
}

生成结果

Word内容

java实现docx转doc java将docx转html_开发语言

生成的目录以及结果

java实现docx转doc java将docx转html_开发语言_02

java实现docx转doc java将docx转html_html_03

Html

java实现docx转doc java将docx转html_java_04

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：云平台运维操作云计算平台运维

下一篇：mysql 数据库bit 数据库 bitmap

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯