java 明文的word文件怎么读

原创

mob649e8166179a 2024-09-17 06:35:32 ©著作权

文章标签 Word java apache 文章分类 Java 后端开发

©著作权归作者所有：来自51CTO博客作者mob649e8166179a的原创作品，请联系作者获取转载授权，否则将追究法律责任

在Java中读取明文的Word文件（通常是指.doc或.docx格式的文件），我们可以使用Apache POI库，这是一个非常流行的Java库，支持读取和写入Microsoft Office文件格式，包括Word文档。本文将会讨论如何使用Apache POI库读取Word文件的文本内容，并通过代码示例向读者演示具体的实现步骤。

1. 环境准备

在开始之前，我们需要确保已经设置好了项目的开发环境，并引入Apache POI库。你可以在Maven项目中添加如下依赖：

<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-ooxml</artifactId>
    <version>5.2.3</version>
</dependency>
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi</artifactId>
    <version>5.2.3</version>
</dependency>

请根据需要使用最新的版本号。

2. 读取Word文档内容

下面的代码展示了如何读取一个.docx格式的Word文档。我们将使用XWPFDocument类来处理.docx文件：

import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;

import java.io.FileInputStream;
import java.io.IOException;
import java.util.List;

public class WordReader {

    public static void main(String[] args) {
        String filePath = "path/to/your/document.docx"; // 替换为你的Word文档路径
        try {
            FileInputStream fis = new FileInputStream(filePath);
            XWPFDocument document = new XWPFDocument(fis);
            List<XWPFParagraph> paragraphs = document.getParagraphs();

            for (XWPFParagraph paragraph : paragraphs) {
                System.out.println(paragraph.getText());
            }

            document.close();
            fis.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

在这段代码中，我们首先定义了要读取的Word文件路径。通过FileInputStream打开文件后，我们使用XWPFDocument来加载文档。接着，通过getParagraphs()方法获取文档中所有的段落，将每个段落的文本打印输出。

3. 处理.doc格式的文件

如果你需要读取传统的.doc格式文件，可以使用HWPFDocument类。下面是一个示例，展示如何读取.doc文件：

import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;

import java.io.FileInputStream;
import java.io.IOException;

public class WordReaderDoc {

    public static void main(String[] args) {
        String filePath = "path/to/your/document.doc"; // 替换为你的Word文档路径
        try {
            FileInputStream fis = new FileInputStream(filePath);
            HWPFDocument document = new HWPFDocument(fis);
            WordExtractor extractor = new WordExtractor(document);
            String[] paragraphs = extractor.getParagraphText();

            for (String paragraph : paragraphs) {
                System.out.println(paragraph);
            }

            extractor.close();
            document.close();
            fis.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

在这里，我们使用HWPFDocument来读取.doc文件，使用WordExtractor类来提取段落文本。然后，我们将每一段输出到控制台。

4. 处理表格

如果Word文档中包含表格，我们也可以通过Apache POI来读取表格内容。以下是一个读取.docx文件中表格的示例：

import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFTable;

import java.io.FileInputStream;
import java.io.IOException;
import java.util.List;

public class WordTableReader {

    public static void main(String[] args) {
        String filePath = "path/to/your/document_with_table.docx"; // 替换为你的Word文档路径
        try {
            FileInputStream fis = new FileInputStream(filePath);
            XWPFDocument document = new XWPFDocument(fis);
            List<XWPFTable> tables = document.getTables();

            for (XWPFTable table : tables) {
                int rowCount = table.getNumberOfRows();
                for (int i = 0; i < rowCount; i++) {
                    StringBuilder rowText = new StringBuilder();
                    table.getRow(i).getTableCells().forEach(cell ->
                            rowText.append(cell.getText()).append(" | ")
                    );
                    System.out.println(rowText.toString());
                }
            }

            document.close();
            fis.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

在这段代码中，我们获取了文档中所有的表格，并遍历每张表格的行与单元格，将每个单元格的内容拼接并打印输出。