LZMA(Lempel-Ziv-Markov chain-Algorithm),是一个Deflate和LZ77算法改良和优化后的压缩算法,它类似于LZ77的字典编码机制,在一般的情况下压缩率比bzip2高,用于压缩的可变字典最大可达4GB。
LZMA的算法原理相对比较复杂,感兴趣的同学可以自行百度查看。
本文针对磁盘上和内存中两种方式进行压缩和解压演示,演示只针对一层目录结构进行,多层目录只需递归操作进行即可。
· Maven依赖
<dependency>
<groupId>com.github.jponge</groupId>
<artifactId>lzma-java</artifactId>
<version>1.3</version>
</dependency>
· 磁盘压缩和解压
无特殊情况下,操作都是在磁盘上进行,将所有文件存放在某一目录中,然后对目录进行压缩,工具类代码如下:
package com.arhorchin.securitit.compress.lzma;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import lzma.sdk.lzma.Decoder;
import lzma.sdk.lzma.Encoder;
public class LzmaDiskUtil {
/**
* LZMA算法 压缩.
* @param srcFilePath 待压缩文件路径.
* @param tarFilePath 已压缩文件路径.
* @throws Exception .
*/
public static void lzmaCompress(String srcFilePath, String tarFilePath) throws Exception {
Encoder encoder = null;
FileInputStream srcFis = null;
FileOutputStream tarFos = null;
try {
encoder = new Encoder();
srcFis = new FileInputStream(new File(srcFilePath));
tarFos = new FileOutputStream(new File(tarFilePath));
encoder.setEndMarkerMode(false);
encoder.writeCoderProperties(tarFos);
long fileSize = srcFis.available();
for (int i = 0; i < 8; i++) {
tarFos.write((int) (fileSize >>> (8 * i)) & 0xFF);
}
encoder.code(srcFis, tarFos, -1, -1, null);
} finally {
if (null != srcFis) {
srcFis.close();
}
if (null != tarFos) {
tarFos.close();
}
}
}
/**
* LZMA算法 解压.
* @param srcFilePath 待解压文件路径.
* @param tarFilePath 已解压文件路径.
* @throws Exception .
*/
public static void lzmaDecompress(String srcFilePath, String tarFilePath) throws Exception {
Decoder decoder = null;
FileInputStream srcFis = null;
FileOutputStream tarFos = null;
try {
decoder = new Decoder();
srcFis = new FileInputStream(new File(srcFilePath));
tarFos = new FileOutputStream(new File(tarFilePath));
int propertiesSize = 5;
byte[] properties = new byte[propertiesSize];
if (srcFis.read(properties, 0, propertiesSize) != propertiesSize) {
throw new IOException("input .lzma file is too short");
}
if (!decoder.setDecoderProperties(properties)) {
throw new IOException("Incorrect stream properties");
}
long outSize = 0;
for (int i = 0; i < 8; i++) {
int v = srcFis.read();
if (v < 0) {
throw new IOException("Can't read stream size");
}
outSize |= ((long) v) << (8 * i);
}
if (!decoder.code(srcFis, tarFos, outSize)) {
throw new IOException("Error in data stream");
}
} finally {
if (null != srcFis) {
srcFis.close();
}
if (null != tarFos) {
tarFos.close();
}
}
}
}
测试代码如下:
package com.arhorchin.securitit.com.compress;
import com.arhorchin.securitit.compress.lzma.LzmaDiskUtil;
public class LzmaDiskUtilTester {
public static void main(String[] args) throws Exception {
String srcFilePath = "C:/Users/Administrator/Downloads/个人文件/test.xml";
String tarFilePath = "C:/Users/Administrator/Downloads/个人文件/test-lzma.xml";
LzmaDiskUtil.lzmaCompress(srcFilePath, tarFilePath);
String vTarFilePath = "C:/Users/Administrator/Downloads/个人文件/test-unlzma.xml";
LzmaDiskUtil.lzmaDecompress(tarFilePath, vTarFilePath);
}
}
· 内存压缩和解压
在实际应用中,对应不同需求,可能需要生成若干文件,然后将其压缩。在某些应用中,文件较小、文件数量较少且较为固定,频繁与磁盘操作,会带来不必要的效率影响。此时,可以在内存中将文件进行压缩得到.7z文件,工具类代码如下:
package com.arhorchin.securitit.compress.lzma;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import lzma.sdk.lzma.Decoder;
import lzma.sdk.lzma.Encoder;
public class LzmaMemoryUtil {
/**
* LZMA算法 压缩.
* @param fileBytes 待压缩文件.
* @return 已压缩文件.
* @throws Exception .
*/
public static byte[] lzmaCompress(byte[] fileBytes) throws Exception {
Encoder encoder = null;
ByteArrayInputStream bais = null;
ByteArrayOutputStream baos = null;
try {
encoder = new Encoder();
baos = new ByteArrayOutputStream();
bais = new ByteArrayInputStream(fileBytes);
encoder.setEndMarkerMode(false);
encoder.writeCoderProperties(baos);
long fileSize = bais.available();
for (int i = 0; i < 8; i++) {
baos.write((int) (fileSize >>> (8 * i)) & 0xFF);
}
encoder.code(bais, baos, -1, -1, null);
return baos.toByteArray();
} finally {
if (null != bais) {
bais.close();
}
if (null != baos) {
baos.close();
}
}
}
/**
* LZMA算法 解压.
* @param fileBytes 待解压文件.
* @return 已解压文件.
* @throws Exception .
*/
public static byte[] lzmaDecompress(byte[] fileBytes) throws Exception {
Decoder decoder = null;
ByteArrayInputStream bais = null;
ByteArrayOutputStream baos = null;
decoder = new Decoder();
baos = new ByteArrayOutputStream();
bais = new ByteArrayInputStream(fileBytes);
try {
int propertiesSize = 5;
byte[] properties = new byte[propertiesSize];
if (bais.read(properties, 0, propertiesSize) != propertiesSize) {
throw new IOException("input .lzma file is too short");
}
if (!decoder.setDecoderProperties(properties)) {
throw new IOException("Incorrect stream properties");
}
long outSize = 0;
for (int i = 0; i < 8; i++) {
int v = bais.read();
if (v < 0) {
throw new IOException("Can't read stream size");
}
outSize |= ((long) v) << (8 * i);
}
if (!decoder.code(bais, baos, outSize)) {
throw new IOException("Error in data stream");
}
return baos.toByteArray();
} finally {
if (null != bais) {
bais.close();
}
if (null != baos) {
baos.close();
}
}
}
}
测试代码如下:
package com.arhorchin.securitit.com.compress;
import java.io.File;
import org.apache.commons.io.FileUtils;
import com.arhorchin.securitit.compress.lzma.LzmaMemoryUtil;
public class LzmaMemoryUtilTester {
public static void main(String[] args) throws Exception {
String txt = FileUtils.readFileToString(new File("C:/Users/Administrator/Downloads/个人文件/test-002.xml"));
byte[] bts = txt.getBytes("UTF-8");
System.out.println("====压缩前数据长度:====" + bts.length);
bts = LzmaMemoryUtil.lzmaCompress(bts);
System.out.println("====压缩后数据长度:====" + bts.length);
// System.out.println("====压缩后数据经Base64编码后:====" + Base64.encodeBase64String(bts));
System.out.println("====解压前数据长度:====" + bts.length);
bts = LzmaMemoryUtil.lzmaDecompress(bts);
System.out.println("====解压后数据长度:====" + bts.length);
txt = new String(bts, "UTF-8");
}
}
· 总结
由于LZMA是7z使用的一种压缩算法,与本博之前介绍7z的博文总结类似,使用LZMA压缩格式可以取得更高的压缩比,当然,任何事情发生都是有前提的,在对不同类型或不同内容文件进行压缩时,压缩比会存在变动,并不会一直稳定在某个水准。总的来说,抛开条件谈性能、谈效率,都是耍流氓。在传输或存储时,对文件大小有要求的场景下,可以使用此种压缩格式。但同时也要注意7z高压缩比所带来的负面影响,以便在系统或功能设计时,可以提前预知风险且提早进行风险防控。