LZMA(Lempel-Ziv-Markov chain-Algorithm),是一个Deflate和LZ77算法改良和优化后的压缩算法,它类似于LZ77的字典编码机制,在一般的情况下压缩率比bzip2高,用于压缩的可变字典最大可达4GB。

  LZMA的算法原理相对比较复杂,感兴趣的同学可以自行百度查看。

  本文针对磁盘上和内存中两种方式进行压缩和解压演示,演示只针对一层目录结构进行,多层目录只需递归操作进行即可。

  · Maven依赖

<dependency>
    <groupId>com.github.jponge</groupId>
    <artifactId>lzma-java</artifactId>
    <version>1.3</version>
</dependency>

  · 磁盘压缩和解压

  无特殊情况下,操作都是在磁盘上进行,将所有文件存放在某一目录中,然后对目录进行压缩,工具类代码如下:

package com.arhorchin.securitit.compress.lzma;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;

import lzma.sdk.lzma.Decoder;
import lzma.sdk.lzma.Encoder;

public class LzmaDiskUtil {

    /**
     * LZMA算法 压缩.
     * @param srcFilePath 待压缩文件路径.
     * @param tarFilePath 已压缩文件路径.
     * @throws Exception .
     */
    public static void lzmaCompress(String srcFilePath, String tarFilePath) throws Exception {
        Encoder encoder = null;
        FileInputStream srcFis = null;
        FileOutputStream tarFos = null;

        try {
            encoder = new Encoder();
            srcFis = new FileInputStream(new File(srcFilePath));
            tarFos = new FileOutputStream(new File(tarFilePath));

            encoder.setEndMarkerMode(false);
            encoder.writeCoderProperties(tarFos);
            long fileSize = srcFis.available();
            for (int i = 0; i < 8; i++) {
                tarFos.write((int) (fileSize >>> (8 * i)) & 0xFF);
            }
            encoder.code(srcFis, tarFos, -1, -1, null);
        } finally {
            if (null != srcFis) {
                srcFis.close();
            }
            if (null != tarFos) {
                tarFos.close();
            }
        }
    }

    /**
     * LZMA算法 解压.
     * @param srcFilePath 待解压文件路径.
     * @param tarFilePath 已解压文件路径.
     * @throws Exception .
     */
    public static void lzmaDecompress(String srcFilePath, String tarFilePath) throws Exception {
        Decoder decoder = null;
        FileInputStream srcFis = null;
        FileOutputStream tarFos = null;

        try {
            decoder = new Decoder();
            srcFis = new FileInputStream(new File(srcFilePath));
            tarFos = new FileOutputStream(new File(tarFilePath));
            int propertiesSize = 5;
            byte[] properties = new byte[propertiesSize];
            if (srcFis.read(properties, 0, propertiesSize) != propertiesSize) {
                throw new IOException("input .lzma file is too short");
            }
            if (!decoder.setDecoderProperties(properties)) {
                throw new IOException("Incorrect stream properties");
            }
            long outSize = 0;
            for (int i = 0; i < 8; i++) {
                int v = srcFis.read();
                if (v < 0) {
                    throw new IOException("Can't read stream size");
                }
                outSize |= ((long) v) << (8 * i);
            }
            if (!decoder.code(srcFis, tarFos, outSize)) {
                throw new IOException("Error in data stream");
            }
        } finally {
            if (null != srcFis) {
                srcFis.close();
            }
            if (null != tarFos) {
                tarFos.close();
            }
        }
    }

}

  测试代码如下:

package com.arhorchin.securitit.com.compress;

import com.arhorchin.securitit.compress.lzma.LzmaDiskUtil;

public class LzmaDiskUtilTester {

    public static void main(String[] args) throws Exception {
        String srcFilePath = "C:/Users/Administrator/Downloads/个人文件/test.xml";
        String tarFilePath = "C:/Users/Administrator/Downloads/个人文件/test-lzma.xml";
        
        LzmaDiskUtil.lzmaCompress(srcFilePath, tarFilePath);
        
        String vTarFilePath = "C:/Users/Administrator/Downloads/个人文件/test-unlzma.xml";
        LzmaDiskUtil.lzmaDecompress(tarFilePath, vTarFilePath);
    }
    
}

  · 内存压缩和解压

  在实际应用中,对应不同需求,可能需要生成若干文件,然后将其压缩。在某些应用中,文件较小、文件数量较少且较为固定,频繁与磁盘操作,会带来不必要的效率影响。此时,可以在内存中将文件进行压缩得到.7z文件,工具类代码如下:

package com.arhorchin.securitit.compress.lzma;

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;

import lzma.sdk.lzma.Decoder;
import lzma.sdk.lzma.Encoder;

public class LzmaMemoryUtil {

    /**
     * LZMA算法 压缩.
     * @param fileBytes 待压缩文件.
     * @return 已压缩文件.
     * @throws Exception .
     */
    public static byte[] lzmaCompress(byte[] fileBytes) throws Exception {
        Encoder encoder = null;
        ByteArrayInputStream bais = null;
        ByteArrayOutputStream baos = null;

        try {
            encoder = new Encoder();
            baos = new ByteArrayOutputStream();
            bais = new ByteArrayInputStream(fileBytes);

            encoder.setEndMarkerMode(false);
            encoder.writeCoderProperties(baos);
            long fileSize = bais.available();
            for (int i = 0; i < 8; i++) {
                baos.write((int) (fileSize >>> (8 * i)) & 0xFF);
            }
            encoder.code(bais, baos, -1, -1, null);

            return baos.toByteArray();
        } finally {
            if (null != bais) {
                bais.close();
            }
            if (null != baos) {
                baos.close();
            }
        }
    }

    /**
     * LZMA算法 解压.
     * @param fileBytes 待解压文件.
     * @return 已解压文件.
     * @throws Exception .
     */
    public static byte[] lzmaDecompress(byte[] fileBytes) throws Exception {
        Decoder decoder = null;
        ByteArrayInputStream bais = null;
        ByteArrayOutputStream baos = null;

        decoder = new Decoder();
        baos = new ByteArrayOutputStream();
        bais = new ByteArrayInputStream(fileBytes);

        try {
            int propertiesSize = 5;
            byte[] properties = new byte[propertiesSize];
            if (bais.read(properties, 0, propertiesSize) != propertiesSize) {
                throw new IOException("input .lzma file is too short");
            }
            if (!decoder.setDecoderProperties(properties)) {
                throw new IOException("Incorrect stream properties");
            }
            long outSize = 0;
            for (int i = 0; i < 8; i++) {
                int v = bais.read();
                if (v < 0) {
                    throw new IOException("Can't read stream size");
                }
                outSize |= ((long) v) << (8 * i);
            }
            if (!decoder.code(bais, baos, outSize)) {
                throw new IOException("Error in data stream");
            }
            return baos.toByteArray();
        } finally {
            if (null != bais) {
                bais.close();
            }
            if (null != baos) {
                baos.close();
            }
        }
    }

}

  测试代码如下:

package com.arhorchin.securitit.com.compress;

import java.io.File;

import org.apache.commons.io.FileUtils;

import com.arhorchin.securitit.compress.lzma.LzmaMemoryUtil;

public class LzmaMemoryUtilTester {

    public static void main(String[] args) throws Exception {
        String txt = FileUtils.readFileToString(new File("C:/Users/Administrator/Downloads/个人文件/test-002.xml"));

        byte[] bts = txt.getBytes("UTF-8");
        System.out.println("====压缩前数据长度:====" + bts.length);
        bts = LzmaMemoryUtil.lzmaCompress(bts);
        System.out.println("====压缩后数据长度:====" + bts.length);
        // System.out.println("====压缩后数据经Base64编码后:====" + Base64.encodeBase64String(bts));

        System.out.println("====解压前数据长度:====" + bts.length);
        bts = LzmaMemoryUtil.lzmaDecompress(bts);
        System.out.println("====解压后数据长度:====" + bts.length);
        txt = new String(bts, "UTF-8");
    }

}

  · 总结

  由于LZMA是7z使用的一种压缩算法,与本博之前介绍7z的博文总结类似,使用LZMA压缩格式可以取得更高的压缩比,当然,任何事情发生都是有前提的,在对不同类型或不同内容文件进行压缩时,压缩比会存在变动,并不会一直稳定在某个水准。总的来说,抛开条件谈性能、谈效率,都是耍流氓。在传输或存储时,对文件大小有要求的场景下,可以使用此种压缩格式。但同时也要注意7z高压缩比所带来的负面影响,以便在系统或功能设计时,可以提前预知风险且提早进行风险防控。