数学之路-python计算实战(4)-Lempel-Ziv压缩(2)

转载

mob604756fca9f3 2017-05-31 11:19:00

文章标签 python 码字 sqlite 数据码表 文章分类 代码人生

Format characters have the following meaning; the conversion between C and Python values should be obvious given their types. The ‘Standard size’ column refers to the size of the packed value in bytes when using standard size; that is, when the format string starts with one of '<', '>', '!' or '='. When using native size, the size of the packed value is platform-dependent.

本博客所有内容是原创，假设转载请注明来源

Format	C Type	Python type	Standard size	Notes
x	pad byte	no value
c	char	string of length 1	1
b	signed char	integer	1	(3)
B	unsigned char	integer	1	(3)
?	_Bool	bool	1	(1)
h	short	integer	2	(3)
H	unsigned short	integer	2	(3)
i	int	integer	4	(3)
I	unsigned int	integer	4	(3)
l	long	integer	4	(3)
L	unsigned long	integer	4	(3)
q	long long	integer	8	(2), (3)
Q	unsigned long long	integer	8	(2), (3)
f	float	float	4	(4)
d	double	float	8	(4)
s	char[]	string
p	char[]	string
P	void *	integer		(5), (3)

struct.pack(fmt, v1, v2, ...)

Return a string containing the values v1, v2, ... packed according to the given format. The arguments must match the values required by the format exactly.

truct.unpack(fmt, string)

Unpack the string (presumably packed by pack(fmt, ...)) according to the given format. The result is a tuple even if it contains exactly one item. The string must contain exactly the amount of data required by the format (len(string) must equal calcsize(fmt)).

读文本文件并压缩以及解压，部分代码例如以下：

# -*- coding: utf-8 -*- 
#lempel-ziv算法
#code:myhaspl@myhaspl.com
import struct
mystr=""
print "\n读取源文件".decode("utf8")
mytextfile= open('test2.txt','r')
try:
     mystr=mytextfile.read( )
finally:
     mytextfile.close()
my_str=mystr
#码表
codeword_dictionary={}
#待压缩文本长度
str_len=len(my_str)
#码字最大长度
dict_maxlen=1
#将解析文本段的位置（下一次解析文本的起点）
now_index=0
#码表的最大索引
max_index=0

#压缩后数据
print "\n生成压缩数据中".decode("utf8") 
compresseddata=[]
while (now_index<str_len):    
    #向后移动步长
    mystep=0
    #当前匹配长度
    now_len=dict_maxlen
    if now_len>str_len-now_index:
        now_len=str_len-now_index
    #查找到的码表索引。0表示没有找到
    cw_addr=0   
    while (now_len>0):
        cw_index=codeword_dictionary.get(my_str[now_index:now_index+now_len])
        if cw_index!=None:
            #找到码字
            cw_addr=cw_index
            mystep=now_len  
            break
        now_len-=1    
    if cw_addr==0:
        #没有找到码字,添加新的码字
        max_index+=1
        mystep=1
        codeword_dictionary[my_str[now_index:now_index+mystep]]=max_index
        print "don't find the Code word,add Code word:%s index:%d"%(my_str[now_index:now_index+mystep],max_index)
    else:
        #找到码字,添加新的码字
        max_index+=1    
        if now_index+mystep+1<=str_len:
            codeword_dictionary[my_str[now_index:now_index+mystep+1]]=max_index
            if mystep+1>dict_maxlen:
                dict_maxlen=mystep+1      
        print "find the Code word:%s  add Code word:%s index:%d"%(my_str[now_index:now_index+now_len],my_str[now_index:now_index+mystep+1],max_index)  
.......
......
        my_codeword_dictionary[my_maxindex]=my_codeword_dictionary[cwkey]+cwlaster        
        uncompressdata.append(my_codeword_dictionary[cwkey])
        uncompressdata.append(cwlaster)     
    print ".",
uncompress_str=uncompress_str.join(uncompressdata)
uncompressstr=uncompress_str
print "\n将解压结果写入文件里..\n".decode("utf8")
uncompress_file= open('uncompress.txt','w')
try:
    uncompress_file.write(uncompressstr)
    print "\n解压成功，已解压到uncompress.txt！\n".decode("utf8")
finally:
    uncompress_file.close()

以下对中文维基中对python的解释文本进行压缩：

数学之路-python计算实战(4)-Lempel-Ziv压缩(2)_python

调用该程序先压缩形成压缩文件，然后打开压缩文件解压

$ pypy lempel-ziv-compress.py python.txt python.lzv

………………..

find the Code word: C add Code word: CP index:9938

index:9939de word:ython add Code word:ython

find the Code word:

^ add Code word:

^ h index:9940

find the Code word:ttp add Code word:ttp: index:9941

find the Code word:// add Code word://e index:9942

find the Code word:dit add Code word:ditr index:9943

find the Code word:a. add Code word:a.o index:9944

生成压缩数据头部

将压缩数据写入压缩文件里

…………….

. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . .

将解压结果写入文件里..

解压成功，已解压到uncompress.txt！

查看压缩效果：

$ ls -l -h

…………….

-rw-rw-r-- 1 deep deep 5.0K Jul 1 20:55 lempel-ziv-compress.py

-rw-rw-r-- 1 deep deep 30K Jul 1 20:55 python.lzv

-rw-rw-r-- 1 deep deep 36K Jul 1 20:57 python.txt

-rw-rw-r-- 1 deep deep 36K Jul 1 20:55 uncompress.txt从上面显示结果能够看到，没压缩前为36K，压缩后为30k

压缩sqlite 3.8.5的所有源代码

$ pypy lempel-ziv-compress.py sqlitesrc.txtsqlitesrc.lzv

查看压缩效果：

$ ls -l -h

…………….

-rw-rw-r-- 1 deep deep 3.2M Jul 1 21:18 sqlitesrc.lzv

-rw-rw-r-- 1 deep deep 5.2M Jul 1 21:16 sqlitesrc.txt

-rw-rw-r-- 1 deep deep 5.2M Jul 1 21:18 uncompress.txt

没压缩前为5.2M，压缩后为3.2M

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：[CCNA图文笔记]-22-STP生成树协议实例详解

下一篇：RHEL6.1系统sendmai邮件服务器简单配置

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯