redis 存入的key的编码格式修改修改成utf8 redis默认编码

转载

mob6454cc6172e5 2024-06-06 05:57:52

文章标签 redis 链表 ziplist 压缩链表数据结构 文章分类 Redis 数据库

回顾

在上篇博客 Redis 数据结构底层 skiplist 中，了解了 Redis 的跳表，这篇博客来学习 Redis 中比较重要的数据结构—— ziplist（压缩链表）。

version：3.0

源码地址：3.0/src/ziplist.c（这次不是 .h 文件了，而是在 .c 文件的注释中）。

用处

先说下 ziplist 是做什么的：

/* The ziplist is a specially encoded dually linked list that is designed
 * to be very memory efficient. It stores both strings and integer values,
 * where integers are encoded as actual integers instead of a series of
 * characters. 
 */

简单翻译下，ziplist（压缩链表）是为了节约内存设计的经过特殊编码设计的双向链表。它可以存储字符串值和整数值，整数值是被按照真正的整数编码保存的，而不是被编码成一系列字符。（参考了 ziplist 结构详解）

特性

再来说下有什么特性：

/* It allows push and pop operations on either side of the list
 * in O(1) time.
 */

它可以在表的两端提供复杂度为 O( $redis 存入的key的编码格式修改修改成utf8 redis默认编码_链表$ ) 的 push 和 pop 操作。

下面就来看下上述特性如何实现的。

结构

/* ZIPLIST OVERALL LAYOUT:
 * The general layout of the ziplist is as follows:
 * <zlbytes><zltail><zllen><entry><entry><zlend>
 * 
 * <zlbytes> is an unsigned integer to hold the number of bytes that the ziplist occupies. 
 * This value needs to be stored to be able to resize the entire structure without the need to traverse it first.
 *
 * <zltail> is the offset to the last entry in the list. 
 * This allows a pop operation on the far side of the list without the need for full traversal.
 *
 * <zllen> is the number of entries.When this value is larger than 2**16-2, 
 * we need to traverse the entire list to know how many items it holds.
 *
 * <zlend> is a single byte special value, equal to 255, which indicates the end of the list.
 */

由 <zlbytes><zltail><zllen><entry>……<entry><zlend> 构成，先说下除了 <entry> 的四个：

zlbytes：32bit，记录了当前 ziplist 占用的内存空间大小（可变），方便能够不在遍历整个 ziplist 结构获取占用空间大小的情况下进行内存重分配的实现。
zltail：32bit，记录了当前 ziplist 表中最后一个结点距离压缩链表起始地址的偏移量，不用通过遍历就可以确定 ziplist 尾端元素的地址。
zllen：16bit，记录了当前 ziplist 数据项（entry）的个数，当值大于 $redis 存入的key的编码格式修改修改成utf8 redis默认编码_数据结构_02$
zlend：恒等于 255，代表 ziplist 的尾端。

上述所说的大小参见 3.0/src/ziplist.c 141 行：

/* Utility macros */
#define ZIPLIST_BYTES(zl)       (*((uint32_t*)(zl)))
#define ZIPLIST_TAIL_OFFSET(zl) (*((uint32_t*)((zl)+sizeof(uint32_t))))
#define ZIPLIST_LENGTH(zl)      (*((uint16_t*)((zl)+sizeof(uint32_t)*2)))
#define ZIPLIST_HEADER_SIZE     (sizeof(uint32_t)*2+sizeof(uint16_t))
#define ZIPLIST_ENTRY_HEAD(zl)  ((zl)+ZIPLIST_HEADER_SIZE)
#define ZIPLIST_ENTRY_TAIL(zl)  ((zl)+intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl)))
#define ZIPLIST_ENTRY_END(zl)   ((zl)+intrev32ifbe(ZIPLIST_BYTES(zl))-1)

再来具体说下 entry：

ZIPLIST ENTRIES:
 * Every entry in the ziplist is prefixed by a header that contains two pieces
 * of information. First, the length of the previous entry is stored to be
 * able to traverse the list from back to front. Second, the encoding with an
 * optional string length of the entry itself is stored.
 *
 * The length of the previous entry is encoded in the following way:
 * If this length is smaller than 254 bytes, it will only consume a single
 * byte that takes the length as value. When the length is greater than or
 * equal to 254, it will consume 5 bytes. The first byte is set to 254 to
 * indicate a larger value is following. The remaining 4 bytes take the
 * length of the previous entry as value.
 *
 * The other header field of the entry itself depends on the contents of the
 * entry. When the entry is a string, the first 2 bits of this header will hold
 * the type of encoding used to store the length of the string, followed by the
 * actual length of the string. When the entry is an integer the first 2 bits
 * are both set to 1. The following 2 bits are used to specify what kind of
 * integer will be stored after this header. An overview of the different
 * types and encodings is as follows:
 *
 * |00pppppp| - 1 byte
 *      String value with length less than or equal to 63 bytes (6 bits).
 * |01pppppp|qqqqqqqq| - 2 bytes
 *      String value with length less than or equal to 16383 bytes (14 bits).
 * |10______|qqqqqqqq|rrrrrrrr|ssssssss|tttttttt| - 5 bytes
 *      String value with length greater than or equal to 16384 bytes.
 * |11000000| - 1 byte
 *      Integer encoded as int16_t (2 bytes).
 * |11010000| - 1 byte
 *      Integer encoded as int32_t (4 bytes).
 * |11100000| - 1 byte
 *      Integer encoded as int64_t (8 bytes).
 * |11110000| - 1 byte
 *      Integer encoded as 24 bit signed (3 bytes).
 * |11111110| - 1 byte
 *      Integer encoded as 8 bit signed (1 byte).
 * |1111xxxx| - (with xxxx between 0000 and 1101) immediate 4 bit integer.
 *      Unsigned integer from 0 to 12. The encoded value is actually from
 *      1 to 13 because 0000 and 1111 can not be used, so 1 should be
 *      subtracted from the encoded 4 bit value to obtain the right value.
 * |11111111| - End of ziplist.

每个结点前面都有一个 header，这个 header 包含了两类信息：

1、上一个数据项的长度（大小），从后向前遍历时使用（从后一项位置向前移动该长度，就找到了前一项）

如果上一个数据项占用字节数小于 254，则用 1 个字节来保存，字节值就是上一个数据项的占用字节数。
如果上一个数据项占用字节数大于等于 254，则用 5 个字节表示。为了表示这种情况，第一个字节的值是 254，后面的 4 个字节组成一个数，存储前一个数据项的占用字节大小。

不是 255 的原因是 255 已经被用来表示 ziplist 尾端了。

2、当前数据项本身的数据长度，具体内容和数据项保存的值有关

如果保存的是字符串，则头 2 位将保存编码字符串长度（大小）使用的类型，之后是字符串真正的长度；

1）|00pppppp| - 1 byte：字符串长度小于等于 63 字节（ $redis 存入的key的编码格式修改修改成utf8 redis默认编码_数据结构_03$ ）
2）|01pppppp|qqqqqqqq| - 2 bytes：字符串长度小于等于 16383 字节（ $redis 存入的key的编码格式修改修改成utf8 redis默认编码_redis_04$ ）
3）|10______|qqqqqqqq|rrrrrrrr|ssssssss|tttttttt| - 5 bytes：字符串长度大于等于 16384 字节（ $redis 存入的key的编码格式修改修改成utf8 redis默认编码_数据结构_05$ ）

如果保存的是整数，那么头 2 位都会被设置为 1,后面两字节用来标识结点保存整数的类型。

1）|11000000| - 1 byte：2 个字节的 int16_t 类型整数
2）|11010000| - 1 byte：4 个字节的 int32_t 类型整数
3）|11100000| - 1 byte：8 个字节的 int64_t 类型整数
4）|11110000| - 1 byte：3 个字节长的整数
5）|11111110| - 1 byte：1 个字节长的整数
6）|1111xxxx| - (with xxxx between 0000 and 1101) immediate 4 bit integer：从 1 到 13 一共 13 个值，用 13 个值来保存真正的数据（数据而非数据长度）

参考 Redis内部数据结构详解，：

redis 存入的key的编码格式修改修改成utf8 redis默认编码_压缩链表_06

redis 存入的key的编码格式修改修改成utf8 redis默认编码_链表_07

3、看下结构：

typedef struct zlentry {
	// 编码上一个 entry 长度用的字节大小，上一个 entry 的长度
    unsigned int prevrawlensize, prevrawlen;
    // 编码当前 entry 长度用的字节大小，当前 entry 的长度
    unsigned int lensize, len;
    // header 部分的大小，prevrawlensize + lensize
    unsigned int headersize;
    // 当前 entry 的编码方式
    unsigned char encoding;
    // 指向 entry 的指针，即 prev-entry-len 字段。
    unsigned char *p;
} zlentry;

具体的图就不画了，自己有几个地方捋顺不清，暂时先按照 Redis内部数据结构详解此篇博客中的来理解吧（找了好久资料，都是把 entry 划分为 3 个部分来解释的，自己 C 的知识几乎没有，就先这样吧）

大概的