Java 压缩字符串长度

引言

在日常的开发过程中,我们经常会遇到需要压缩字符串长度的需求。字符串长度的压缩可以节省存储空间和网络传输带宽,并提高程序的性能。本文将介绍如何使用 Java 编程语言来压缩字符串长度,并提供相关的代码示例。

字符串压缩算法

字符串压缩的核心思想是通过一种算法将原始字符串转换为较短的表示,同时能够保证转换后的字符串可以被还原为原始字符串。常见的字符串压缩算法有:

1. Run-length Encoding (RLE)

运行长度编码(Run-length Encoding,简称 RLE)是一种简单而有效的字符串压缩算法。它的基本原理是将连续重复出现的字符转换为一个计数值和一个字符的组合。例如,字符串 "AAABBBCCC" 可以被压缩为 "3A3B3C"。

下面是使用 Java 实现 RLE 字符串压缩的示例代码:

public class RleCompression {
    public static String compress(String input) {
        StringBuilder output = new StringBuilder();
        int count = 1;
        for (int i = 1; i < input.length(); i++) {
            if (input.charAt(i) == input.charAt(i - 1)) {
                count++;
            } else {
                output.append(count).append(input.charAt(i - 1));
                count = 1;
            }
        }
        output.append(count).append(input.charAt(input.length() - 1));
        return output.toString();
    }
    
    public static void main(String[] args) {
        String input = "AAABBBCCC";
        String compressed = compress(input);
        System.out.println("Compressed string: " + compressed);
    }
}

2. Huffman Coding

赫夫曼编码(Huffman Coding)是一种常用的无损数据压缩算法,它通过根据字符出现的频率来构建一棵二叉树,然后使用该二叉树来表示字符。频率高的字符被赋予较短的编码,频率低的字符被赋予较长的编码。

下面是使用 Java 实现 Huffman 编码字符串压缩的示例代码:

public class HuffmanCompression {
    private static class TreeNode implements Comparable<TreeNode> {
        char ch;
        int freq;
        TreeNode left;
        TreeNode right;
        
        public TreeNode(char ch, int freq) {
            this.ch = ch;
            this.freq = freq;
        }

        @Override
        public int compareTo(TreeNode other) {
            return this.freq - other.freq;
        }
    }
    
    public static String compress(String input) {
        Map<Character, Integer> frequencyMap = new HashMap<>();
        for (char ch : input.toCharArray()) {
            frequencyMap.put(ch, frequencyMap.getOrDefault(ch, 0) + 1);
        }
        
        PriorityQueue<TreeNode> queue = new PriorityQueue<>();
        for (Map.Entry<Character, Integer> entry : frequencyMap.entrySet()) {
            queue.offer(new TreeNode(entry.getKey(), entry.getValue()));
        }
        
        while (queue.size() > 1) {
            TreeNode left = queue.poll();
            TreeNode right = queue.poll();
            TreeNode parent = new TreeNode('\0', left.freq + right.freq);
            parent.left = left;
            parent.right = right;
            queue.offer(parent);
        }
        
        TreeNode root = queue.poll();
        Map<Character, String> encodingMap = new HashMap<>();
        buildEncodingMap(root, "", encodingMap);
        
        StringBuilder compressed = new StringBuilder();
        for (char ch : input.toCharArray()) {
            compressed.append(encodingMap.get(ch));
        }
        
        return compressed.toString();
    }
    
    private static void buildEncodingMap(TreeNode node, String code, Map<Character, String> encodingMap) {
        if (node == null) {
            return;
        }
        if (node.left == null && node.right == null) {
            encodingMap.put(node.ch, code);
        }
        buildEncodingMap(node.left, code + "0", encodingMap);
        buildEncodingMap(node.right, code + "1", encodingMap);
    }
    
    public static void main(String[] args) {
        String input = "AAABBBCCC";
        String compressed = compress(input);
        System.out.println("Compressed string: " + compressed);
    }
}

序列图

下面是使用 Mermaid 语法绘制的 RLE 字符串压缩的序列图:

sequenceDiagram
    participant Client
    participant RleCompression
    Client->>RleCompression: compress