Rocksdb 索引

转载

勇往直前的巨人 2024-10-08 13:54:13

文章标签 Rocksdb 索引消息中间件 RocketMQ mq MQ Diff 取模字段 文章分类 数据仓库大数据

[size=medium]1. 介绍[/size]
索引文件指，发送一条消息后，MQ通过（topic和uniqKey）或（topic和keys）构建的索引，然后通过queryMsgByKey可以查询到结果。注意，通过queryMsgById的查询，不是索引查询。索引文件存在于/store/index/文件夹下，以时间戳命名，如20151209213520685。每个索引文件，默认存储2000w条数据，文件大小默认为420000040字节。索引文件由头信息，槽位，内容组成。

[size=medium]2. 组成[/size]
头信息包含6个字段，分别为：
[list]
[*]beginTimestamp：long型，第一条消息的存储时间。

[*]endTimestamp：long型，最后一条消息的存储时间。broker异常关闭下判断是否删除索引文件。

[*]beginPhyOffset：long型，第一条消息的offset。

[*]endPhyOffset：long型，最后一条消息的offset。

[*]hashSlotCount：int型，slot数量。从零递增，有消息即+1。

[*]indexCount：int型，index数量。从1递增。有消息即+1。
所以，头信息占用40字节。[/list]
内容包括四个字段，分别为：
[list]
[*]keyHash：int型，key的hash值，key为topic和uniqueKey或topic和keys组合。

[*]phyOffset： long型，offset值。

[*]timeDiff： int型，消息的存储时间与beginTimestamp差值。

[*]nextIndexOffset：int型，即key的hashcode或取模冲突后，指向的下一个index offset。
所以每条索引消息占用20字节。槽位（slotNum）默认有500w，int型。
[/list]
说明：如果uniqueKey不为空，以topic和uniqueKey创建索引。然后判断keys是否为空，不为空，以”空格”分隔keys得到key组数，以topic和key值进行索引：

if (req.getUniqKey() != null) {
                indexFile = putKey(indexFile, msg, buildKey(topic, req.getUniqKey()));
                if (indexFile == null) {
                    log.error("putKey error commitlog {} uniqkey {}", req.getCommitLogOffset(), req.getUniqKey());
                    return;
                }
            }

            if ((keys != null && keys.length() > 0)) {
                String[] keyset = keys.split(MessageConst.KEY_SEPARATOR);
                for (int i = 0; i <  keyset.length; i++) {
                    String key = keyset[i];
                    if (key.length() > 0) {
                            indexFile = putKey(indexFile, msg, buildKey(topic, key));
                            if (indexFile == null) {
                                log.error("putKey error commitlog {} uniqkey {}", req.getCommitLogOffset(), req.getUniqKey());
                                return;
                            }
                        }
                 }
            }

[size=medium]3. 创建[/size]

当producer发送消息到broker后，MQ通过ReputMessageService线程异步构建consumequeue和index。

[size=medium]4. 插入[/size]

当有索引消息时，先计算key的hashcode值，然后hashcode%slotNum得到槽位，由于key的hashcode和取模都会导致冲突，所以槽值总是指向最新的一个索引项。为了节省空间，存储的时间是存储时间-开始时间。

public boolean putKey(final String key, final long phyOffset, final long storeTimestamp) {
        if (this.indexHeader.getIndexCount() < this.indexNum) {
            int keyHash = indexKeyHashMethod(key);
            int slotPos = keyHash % this.hashSlotNum;
            int absSlotPos = IndexHeader.INDEX_HEADER_SIZE + slotPos * HASH_SLOT_SIZE;

            FileLock fileLock = null;

            try {

                // fileLock = this.fileChannel.lock(absSlotPos, HASH_SLOT_SIZE,
                // false);
                int slotValue = this.mappedByteBuffer.getInt(absSlotPos);
                if (slotValue <= INVALID_INDEX || slotValue > this.indexHeader.getIndexCount()) {
                    slotValue = INVALID_INDEX;
                }

                long timeDiff = storeTimestamp - this.indexHeader.getBeginTimestamp();


                timeDiff = timeDiff / 1000;


                if (this.indexHeader.getBeginTimestamp() <= 0) {
                    timeDiff = 0;
                } else if (timeDiff > Integer.MAX_VALUE) {
                    timeDiff = Integer.MAX_VALUE;
                } else if (timeDiff < 0) {
                    timeDiff = 0;
                }

                int absIndexPos =
                        IndexHeader.INDEX_HEADER_SIZE + this.hashSlotNum * HASH_SLOT_SIZE
                                + this.indexHeader.getIndexCount() * INDEX_SIZE;


                this.mappedByteBuffer.putInt(absIndexPos, keyHash);
                this.mappedByteBuffer.putLong(absIndexPos + 4, phyOffset);
                this.mappedByteBuffer.putInt(absIndexPos + 4 + 8, (int) timeDiff);
                this.mappedByteBuffer.putInt(absIndexPos + 4 + 8 + 4, slotValue);


                this.mappedByteBuffer.putInt(absSlotPos, this.indexHeader.getIndexCount());


                if (this.indexHeader.getIndexCount() <= 1) {
                    this.indexHeader.setBeginPhyOffset(phyOffset);
                    this.indexHeader.setBeginTimestamp(storeTimestamp);
                }

                this.indexHeader.incHashSlotCount();
                this.indexHeader.incIndexCount();
                this.indexHeader.setEndPhyOffset(phyOffset);
                this.indexHeader.setEndTimestamp(storeTimestamp);

                return true;
            } catch (Exception e) {
                log.error("putKey exception, Key: " + key + " KeyHashCode: " + key.hashCode(), e);
            } finally {
                if (fileLock != null) {
                    try {
                        fileLock.release();
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                }
            }
        } else {
            log.warn("putKey index count " + this.indexHeader.getIndexCount() + " index max num "
                    + this.indexNum);
        }

        return false;
    }

[size=medium]5. 查询[/size]

先计算出对应的slot，由于key的hash值不同但模数相同，所以在查询时会比较一次key的hash值，然后加入返回列表，每次最多返回32条索引信息。这里需要注意，由于hash值相同但key不等下产生的相同slot，也会被返回给客户端，所以在客户端又进行了一次处理。

public void selectPhyOffset(final List<Long> phyOffsets, final String key, final int maxNum,
                                final long begin, final long end, boolean lock) {
        if (this.mapedFile.hold()) {
            int keyHash = indexKeyHashMethod(key);
            int slotPos = keyHash % this.hashSlotNum;
            int absSlotPos = IndexHeader.INDEX_HEADER_SIZE + slotPos * HASH_SLOT_SIZE;

            FileLock fileLock = null;
            try {
                if (lock) {
                    // fileLock = this.fileChannel.lock(absSlotPos,
                    // HASH_SLOT_SIZE, true);
                }

                int slotValue = this.mappedByteBuffer.getInt(absSlotPos);
                // if (fileLock != null) {
                // fileLock.release();
                // fileLock = null;
                // }

                if (slotValue <= INVALID_INDEX || slotValue > this.indexHeader.getIndexCount()
                        || this.indexHeader.getIndexCount() <= 1) {
                    // TODO NOTFOUND
                } else {
                    for (int nextIndexToRead = slotValue; ; ) {
                        if (phyOffsets.size() >= maxNum) {
                            break;
                        }

                        int absIndexPos =
                                IndexHeader.INDEX_HEADER_SIZE + this.hashSlotNum * HASH_SLOT_SIZE
                                        + nextIndexToRead * INDEX_SIZE;

                        int keyHashRead = this.mappedByteBuffer.getInt(absIndexPos);
                        long phyOffsetRead = this.mappedByteBuffer.getLong(absIndexPos + 4);

                        long timeDiff = (long) this.mappedByteBuffer.getInt(absIndexPos + 4 + 8);
                        int prevIndexRead = this.mappedByteBuffer.getInt(absIndexPos + 4 + 8 + 4);


                        if (timeDiff < 0) {
                            break;
                        }

                        timeDiff *= 1000L;

                        long timeRead = this.indexHeader.getBeginTimestamp() + timeDiff;
                        boolean timeMatched = (timeRead >= begin) && (timeRead <= end);

                        if (keyHash == keyHashRead && timeMatched) {
                            phyOffsets.add(phyOffsetRead);
                        }

                        if (prevIndexRead <= INVALID_INDEX
                                || prevIndexRead > this.indexHeader.getIndexCount()
                                || prevIndexRead == nextIndexToRead || timeRead < begin) {
                            break;
                        }

                        nextIndexToRead = prevIndexRead;
                    }
                }
            } catch (Exception e) {
                log.error("selectPhyOffset exception ", e);
            } finally {
                if (fileLock != null) {
                    try {
                        fileLock.release();
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                }

                this.mapedFile.release();
            }
        }
    }

[size=medium]6. 注意[/size]

由于每条索引消息的构建并未同步更新checkpoint文件的indexMsgTimestamp信息，所以在broker异常关闭情形下，broker重启后会删除最后一个索引文件，这将导致消息通过索引查询失败。确实不太清楚，作者的本意。也许是遗漏同步更新checkpoint的bug吧。

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。