Java 文本压缩
介绍
在日常开发中,文本压缩是一个常见的需求。文本压缩可以减小文件的大小,从而节省存储空间和网络传输带宽。在 Java 中,我们可以使用多种方法来实现文本压缩。本文将介绍两种常用的文本压缩算法:Huffman 编码和LZW 编码,并提供相应的代码示例。
Huffman 编码
Huffman 编码是一种基于字符出现频率的无损压缩算法。它通过构建一个 Huffman 树来实现压缩。Huffman 树是一种特殊的二叉树,其中字符出现频率越高的节点越靠近根节点。
压缩
下面是使用 Huffman 编码进行文本压缩的 Java 代码示例:
import java.util.PriorityQueue;
class HuffmanNode implements Comparable<HuffmanNode> {
char character;
int frequency;
HuffmanNode left;
HuffmanNode right;
public HuffmanNode(char character, int frequency) {
this.character = character;
this.frequency = frequency;
}
public boolean isLeaf() {
return left == null && right == null;
}
@Override
public int compareTo(HuffmanNode node) {
return this.frequency - node.frequency;
}
}
public class HuffmanCompression {
private static final int ASCII_SIZE = 256;
public static void compress(String input) {
int[] frequencies = buildFrequencyTable(input);
HuffmanNode root = buildHuffmanTree(frequencies);
String[] codes = generateCodes(root);
StringBuilder compressedText = new StringBuilder();
for (char c : input.toCharArray()) {
compressedText.append(codes[c]);
}
System.out.println("Compressed text: " + compressedText);
}
private static int[] buildFrequencyTable(String input) {
int[] frequencies = new int[ASCII_SIZE];
for (char c : input.toCharArray()) {
frequencies[c]++;
}
return frequencies;
}
private static HuffmanNode buildHuffmanTree(int[] frequencies) {
PriorityQueue<HuffmanNode> queue = new PriorityQueue<>();
for (char c = 0; c < ASCII_SIZE; c++) {
if (frequencies[c] > 0) {
queue.offer(new HuffmanNode(c, frequencies[c]));
}
}
while (queue.size() > 1) {
HuffmanNode left = queue.poll();
HuffmanNode right = queue.poll();
HuffmanNode parent = new HuffmanNode('\0', left.frequency + right.frequency);
parent.left = left;
parent.right = right;
queue.offer(parent);
}
return queue.poll();
}
private static String[] generateCodes(HuffmanNode root) {
String[] codes = new String[ASCII_SIZE];
generateCodes(root, "", codes);
return codes;
}
private static void generateCodes(HuffmanNode node, String code, String[] codes) {
if (node.isLeaf()) {
codes[node.character] = code;
return;
}
generateCodes(node.left, code + '0', codes);
generateCodes(node.right, code + '1', codes);
}
}
public class Main {
public static void main(String[] args) {
String input = "This is a sample text for compression using Huffman coding.";
HuffmanCompression.compress(input);
}
}
在上述代码中,HuffmanCompression
类包含了压缩代码的逻辑。compress
方法接受一个字符串作为输入,调用 buildFrequencyTable
方法构建字符频率表,然后调用 buildHuffmanTree
方法构建 Huffman 树。接下来,使用 Huffman 树生成字符编码表,最后根据编码表将输入文本压缩为二进制字符串。
解压缩
为了完整地使用 Huffman 编码,我们还需要提供一个解压缩的过程。下面是使用 Huffman 编码进行文本解压缩的 Java 代码示例:
public class HuffmanDecompression {
public static void decompress(String compressedText, HuffmanNode root) {
StringBuilder decompressedText = new StringBuilder();
HuffmanNode current = root;
for (char bit : compressedText.toCharArray()) {
if (bit == '0') {
current = current.left;
} else {
current = current.right;
}
if (current.isLeaf()) {
decompressedText.append(current.character);
current = root;
}
}
System.out.println("Decompressed text: " + decompressedText);
}
}
public class Main {
public static void main(String[] args) {
String input = "This is a sample text for compression using Huffman coding.";
HuffmanNode