Java 文本压缩

介绍

在日常开发中,文本压缩是一个常见的需求。文本压缩可以减小文件的大小,从而节省存储空间和网络传输带宽。在 Java 中,我们可以使用多种方法来实现文本压缩。本文将介绍两种常用的文本压缩算法:Huffman 编码和LZW 编码,并提供相应的代码示例。

Huffman 编码

Huffman 编码是一种基于字符出现频率的无损压缩算法。它通过构建一个 Huffman 树来实现压缩。Huffman 树是一种特殊的二叉树,其中字符出现频率越高的节点越靠近根节点。

压缩

下面是使用 Huffman 编码进行文本压缩的 Java 代码示例:

import java.util.PriorityQueue;

class HuffmanNode implements Comparable<HuffmanNode> {
    char character;
    int frequency;
    HuffmanNode left;
    HuffmanNode right;

    public HuffmanNode(char character, int frequency) {
        this.character = character;
        this.frequency = frequency;
    }

    public boolean isLeaf() {
        return left == null && right == null;
    }

    @Override
    public int compareTo(HuffmanNode node) {
        return this.frequency - node.frequency;
    }
}

public class HuffmanCompression {
    private static final int ASCII_SIZE = 256;

    public static void compress(String input) {
        int[] frequencies = buildFrequencyTable(input);
        HuffmanNode root = buildHuffmanTree(frequencies);
        String[] codes = generateCodes(root);
        StringBuilder compressedText = new StringBuilder();

        for (char c : input.toCharArray()) {
            compressedText.append(codes[c]);
        }

        System.out.println("Compressed text: " + compressedText);
    }

    private static int[] buildFrequencyTable(String input) {
        int[] frequencies = new int[ASCII_SIZE];

        for (char c : input.toCharArray()) {
            frequencies[c]++;
        }

        return frequencies;
    }

    private static HuffmanNode buildHuffmanTree(int[] frequencies) {
        PriorityQueue<HuffmanNode> queue = new PriorityQueue<>();

        for (char c = 0; c < ASCII_SIZE; c++) {
            if (frequencies[c] > 0) {
                queue.offer(new HuffmanNode(c, frequencies[c]));
            }
        }

        while (queue.size() > 1) {
            HuffmanNode left = queue.poll();
            HuffmanNode right = queue.poll();
            HuffmanNode parent = new HuffmanNode('\0', left.frequency + right.frequency);
            parent.left = left;
            parent.right = right;
            queue.offer(parent);
        }

        return queue.poll();
    }

    private static String[] generateCodes(HuffmanNode root) {
        String[] codes = new String[ASCII_SIZE];
        generateCodes(root, "", codes);
        return codes;
    }

    private static void generateCodes(HuffmanNode node, String code, String[] codes) {
        if (node.isLeaf()) {
            codes[node.character] = code;
            return;
        }

        generateCodes(node.left, code + '0', codes);
        generateCodes(node.right, code + '1', codes);
    }
}

public class Main {
    public static void main(String[] args) {
        String input = "This is a sample text for compression using Huffman coding.";
        HuffmanCompression.compress(input);
    }
}

在上述代码中,HuffmanCompression 类包含了压缩代码的逻辑。compress 方法接受一个字符串作为输入,调用 buildFrequencyTable 方法构建字符频率表,然后调用 buildHuffmanTree 方法构建 Huffman 树。接下来,使用 Huffman 树生成字符编码表,最后根据编码表将输入文本压缩为二进制字符串。

解压缩

为了完整地使用 Huffman 编码,我们还需要提供一个解压缩的过程。下面是使用 Huffman 编码进行文本解压缩的 Java 代码示例:

public class HuffmanDecompression {
    public static void decompress(String compressedText, HuffmanNode root) {
        StringBuilder decompressedText = new StringBuilder();
        HuffmanNode current = root;

        for (char bit : compressedText.toCharArray()) {
            if (bit == '0') {
                current = current.left;
            } else {
                current = current.right;
            }

            if (current.isLeaf()) {
                decompressedText.append(current.character);
                current = root;
            }
        }

        System.out.println("Decompressed text: " + decompressedText);
    }
}

public class Main {
    public static void main(String[] args) {
        String input = "This is a sample text for compression using Huffman coding.";
        HuffmanNode