一. 前言
压缩链表ziplist是一个经过特殊编码的双向链表,它的设计目标就是为了提高存储效率。ziplist可以用于存储字符串或整数, 其中整数是按真正的二进制表示进行编码的, 而不是编码成字符串序列。 它能以O(1)的时间复杂度在表的两端提供push和pop操作。本文主要分析压缩链表结构体及相关功能函数的源码部分。
二. 结构体分析
压缩链表主要结构为:
<zlbytes> <zltail> <zllen> <entry> <entry> ... <entry> <zlend>
- zlbytes:记录压缩链表占据的字节数,包括自身的4个字节,用于内存重分配
- zltail:尾节点的偏移量,利于实现尾部pop操作
- zllen:节点数,最多2^16-2,若超过范围则必须转换为多个压缩链表
- entry:节点
- zlend:记录压缩链表的尾部,设置为特殊值0xff
压缩链表的节点结构为:
<prevlen> <encoding> <entry-data>
- prevlen:存储上一个节点的长度,用以由后往前回到上一个节点
- encoding:节点的content属性所保存数据的类型以及长度
- entry-data:节点数据
有时也可以直接用encoding代表节点自身,如较小的整形数据,这种情况下可以省略entry-data,即结构体简化为:
<prevlen> <encoding>
encoding域是根据内容来进行分类:
- string类型,第一个字节前2个位表示长度存储类型,后面跟上长度。
- 根据长度的不同00表示1字节长string,最多表示64字节长
- 01表示2字节string,最多表示16384字节的string
- 10表示5字节string,表示大于16384字节的string
- int类型
- |11000000|,3字节表示int16_t
- |11010000|,5字节表示int32_t
- |11100000|,9字节表示int64_t
三. 源码分析
ziplist源码最有意思的地方在于,不同于其他部分会先定义一个ziplist结构体,再定义一个zlentry结构体。他仅通过一系列的宏定义和一个简单的malloc函数给一个指针进行赋值从而隐性定义了ziplist结构体,没有显示的申明。
先介绍一下宏定义和zlentry的结构体
#define ZIP_END 255 /* Special "end of ziplist" entry. */
#define ZIP_BIG_PREVLEN 254 /* Max number of bytes of the previous entry, for
the "prevlen" field prefixing each entry, to be
represented with just a single byte. Otherwise
it is represented as FF AA BB CC DD, where
AA BB CC DD are a 4 bytes unsigned integer
representing the previous entry len. */
/* 支持的不同类型:不同长度的字符串,整形
* Different encoding/length possibilities
*/
#define ZIP_STR_MASK 0xc0
#define ZIP_INT_MASK 0x30
#define ZIP_STR_06B (0 << 6)
#define ZIP_STR_14B (1 << 6)
#define ZIP_STR_32B (2 << 6)
#define ZIP_INT_16B (0xc0 | 0<<4)
#define ZIP_INT_32B (0xc0 | 1<<4)
#define ZIP_INT_64B (0xc0 | 2<<4)
#define ZIP_INT_24B (0xc0 | 3<<4)
#define ZIP_INT_8B 0xfe
/* 4位整形:最大14
* 4 bit integer immediate encoding |1111xxxx| with xxxx between
* 0001 and 1101.
*/
#define ZIP_INT_IMM_MASK 0x0f /* 掩码 Mask to extract the 4 bits value. To add
one is needed to reconstruct the value. */
#define ZIP_INT_IMM_MIN 0xf1 /* 最小值 11110001 */
#define ZIP_INT_IMM_MAX 0xfd /* 最大值不是f 11111101 */
#define INT24_MAX 0x7fffff
#define INT24_MIN (-INT24_MAX - 1)
/* Macro to determine if the entry is a string. String entries never start
* with "11" as most significant bits of the first byte. */
#define ZIP_IS_STR(enc) (((enc) & ZIP_STR_MASK) < ZIP_STR_MASK)
/* 一些有用的宏
* Utility macros.
*/
/* 返回压缩列表总大小
* Return total bytes a ziplist is composed of.
*/
#define ZIPLIST_BYTES(zl) (*((uint32_t*)(zl)))
/* 返回最后一项的偏移量
* Return the offset of the last item inside the ziplist.
*/
#define ZIPLIST_TAIL_OFFSET(zl) (*((uint32_t*)((zl) + sizeof(uint32_t))))
/* 返回压缩列表的长度
* Return the length of a ziplist, or UINT16_MAX if the length cannot be
* determined without scanning the whole ziplist.
*/
#define ZIPLIST_LENGTH(zl) (*((uint16_t*)((zl) + sizeof(uint32_t) * 2)))
/* 返回压缩列表头部大小
* The size of a ziplist header: two 32 bit integers for the total
* bytes count and last item offset. One 16 bit integer for the number
* of items field.
*/
#define ZIPLIST_HEADER_SIZE (sizeof(uint32_t) * 2 + sizeof(uint16_t))
/* 压缩列表尾部大小:1比特
* Size of the "end of ziplist" entry. Just one byte.
*/
#define ZIPLIST_END_SIZE (sizeof(uint8_t))
/* 返回压缩列表第一个变量入口地址
* Return the pointer to the first entry of a ziplist.
*/
#define ZIPLIST_ENTRY_HEAD(zl) ((zl) + ZIPLIST_HEADER_SIZE)
/* 返回压缩列表最后一个变量入口地址
* Return the pointer to the last entry of a ziplist, using the
* last entry offset inside the ziplist header.
*/
#define ZIPLIST_ENTRY_TAIL(zl) ((zl) + intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl)))
/* 返回压缩列表最后一个比特的地址
* Return the pointer to the last byte of a ziplist, which is, the
* end of ziplist FF entry.
*/
#define ZIPLIST_ENTRY_END(zl) ((zl) + intrev32ifbe(ZIPLIST_BYTES(zl)) - 1)
/* 增加压缩列表头部项目计数
* Increment the number of items field in the ziplist header. Note that this
* macro should never overflow the unsigned 16 bit integer, since entires are
* always pushed one at a time. When UINT16_MAX is reached we want the count
* to stay there to signal that a full scan is needed to get the number of
* items inside the ziplist.
*/
#define ZIPLIST_INCR_LENGTH(zl, incr) { \
if (ZIPLIST_LENGTH(zl) < UINT16_MAX) \
ZIPLIST_LENGTH(zl) = intrev16ifbe(intrev16ifbe(ZIPLIST_LENGTH(zl)) + incr); \
}
/* 接受压缩链表信息的结构体(压缩链表节点)
* We use this function to receive information about a ziplist entry.
* Note that this is not how the data is actually encoded, is just what we
* get filled by a function in order to operate more easily.
*/
typedef struct zlentry {
unsigned int prevrawlensize; /* Bytes used to encode the previos entry len*/
unsigned int prevrawlen; /* Previous entry len. */
unsigned int lensize; /* Bytes used to encode this entry type/len.
For example strings have a 1, 2 or 5 bytes
header. Integers always use a single byte.*/
unsigned int len; /* Bytes used to represent the actual entry.
For strings this is just the string length
while for integers it is 1, 2, 3, 4, 8 or
0 (for 4 bit immediate) depending on the
number range. */
unsigned int headersize; /* prevrawlensize + lensize. */
unsigned char encoding; /* Set to ZIP_STR_* or ZIP_INT_* depending on
the entry encoding. However for 4 bits
immediate integers this can assume a range
of values and must be range-checked. */
unsigned char *p; /* Pointer to the very start of the entry, that
is, this points to prev-entry-len field. */
} zlentry;
下面以新建ziplist函数为例来研究一下如何实现隐性ziplist结构体的
/* 创建新的压缩列表
* Create a new empty ziplist.
*/
unsigned char *ziplistNew(void) {
unsigned int bytes = ZIPLIST_HEADER_SIZE + 1;
unsigned char *zl = zmalloc(bytes);
ZIPLIST_BYTES(zl) = intrev32ifbe(bytes);
ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(ZIPLIST_HEADER_SIZE);
ZIPLIST_LENGTH(zl) = 0;
zl[bytes - 1] = ZIP_END;
return zl;
}
由以上代码可见,创建的过程其实就是按照前文所述的结构体形式,先分配内存,然后通过指针偏移单独给每一个结构体变量赋值,只是没有显示的写成结构体而已。
四. 总结
本文简单介绍了压缩链表结构体和相关源码实现部分,下文中将进一步研究更多压缩链表操作函数源码实现。