基本的概念

Block:数据块,ClickHouse进行数据读、写的基本单元,每一个Block实例,不仅包含数据域,还包含了每个列的meta信息。

Chunk:数据块,保存实际数据的单元,Block中的数据域的指向的就是这个类型的实例。

Row:一行记录,包含多个列索引,Chunk可以认为是由多个Row组成的。

Column:一列数据,包含一个列上的Block Size数量的行。

一个Block对象,可以简单理解为一张表,它的每一列都有相同的长度,每一行长度也等:

Block/Chunk

Column1

Column2


ColumnM

Row1

value

value


value

Row2

value

value


value






RowN

value

value


value

聚合函数示例

select uniq(B), uniq(A), uniq(C) from test_tbl

聚合函数的基本过程

InputStream:从数据表读取数据,返回许多Block结构的对象,以便在内存中处理。
Insert:在每一个Block上的每一个列上,进行聚合函数调用,将原始字段值添加到中间数据结构中,在ClickHouse中称为State。
Merge:在每一个Block上的每一个列上,聚合所有的State。
Serialize:序列化State对象。
Deserialize:反序列化State对象。
Final:返回最终结果。

并行插入:通过配置max_streams参数可以控制读取线程数量,默认值为max_threads,但还可以通过max_streams_to_max_threads_ratio参数调整
InputStreamA -> BlockA -> Insert -> Serialize -> StateA
InputStreamA -> BlockA2 -> Insert -> Serialize -> StateA2
InputStreamB -> BlockB -> Insert -> Serialize -> StateB
InputStreamB -> BlockB2 -> Insert -> Serialize -> StateB2

并行合并:并行粒度 = min(max_threads, 并行block数量)
(StateA -> BlockA, StateB -> BlockB) -> StateC
(StateA2 -> BlockA2, StateB2 -> BlockB2) -> StateC2

计算最终结果:单线程
(StateC,State C2) -> insertResultInto -> Final

聚合函数详细执行过程

文件读取

假如test_tbl这个表是一个MergeTree类型的本地表,那么执行聚合操作时,会首先从磁盘上的Part文件读取数据,下面张贴的代码片段,展示了SELECT语句的读取的调用过程,顺序从前到后:

/// A Storage that allows reading from a single MergeTree data part.
/// 在SQL解释执行时,如果执行过程到达了FetchColumns阶段,则会调用
/// InterpreterSelectQuery::executeFetchColumns(...)方法,触发此Storage实例的read()操作。
class StorageFromMergeTreeDataPart final : public ext::shared_ptr_helper<StorageFromMergeTreeDataPart>, public IStorage
{
    friend struct ext::shared_ptr_helper<StorageFromMergeTreeDataPart>;
public:
    String getName() const override { return "FromMergeTreeDataPart"; }

    Pipe read(
        const Names & column_names,
        const StorageMetadataPtr & metadata_snapshot,
        SelectQueryInfo & query_info,
        const Context & context,
        QueryProcessingStage::Enum /*processed_stage*/,
        size_t max_block_size,
        unsigned num_streams) override
    {
        QueryPlan query_plan =
            // 这里在生成一个新的QueryPlan时,并触发数据从文件到内存的加载过程,
            // 然后返回一个Pipe对象,Pipe就是整个Pipeline中的一部分操作的集合。
            std::move(*MergeTreeDataSelectExecutor(part->storage)
                      .readFromParts({part}, column_names, metadata_snapshot, query_info, context, max_block_size, num_streams));

        return query_plan.convertToPipe();
    }
}

创建一个MergeTree类型的、表的数据文件的读取执行器,最终通过此方法完成part文件的读取。

QueryPlanPtr MergeTreeDataSelectExecutor::readFromParts(
    MergeTreeData::DataPartsVector parts,
    const Names & column_names_to_return,
    const StorageMetadataPtr & metadata_snapshot,
    const SelectQueryInfo & query_info,
    const Context & context,
    const UInt64 max_block_size,
    const unsigned num_streams,
    const PartitionIdToMaxBlock * max_block_numbers_to_read) const
{}

MergeTreeDataSelectExecutor::readFromParts(...)内部则是调用MergeTreeIndexReader::read()方法将刚刚读取的数据进行反序列化。

/// 这个方法仅仅会根据设置的Granularity,来读取MergeTreeReaderStream中缓存的数据内容,并反序列化。
/// 这里说刚刚,是因为在实例化MergeTreeIndexReader对象时,会同时构造MergeTreeReaderStream实例,
/// 而在MergeTreeReaderStream初始化时,会触发读取文件的过程。
MergeTreeIndexGranulePtr MergeTreeIndexReader::read()
{
    auto granule = index->createIndexGranule();
    granule->deserializeBinary(*stream.data_buffer);
    return granule;
}

当创建一个MergeTreeReaderStream类的实例时,就会触发对磁盘文件的读取,一旦这个实例初始化完成,也就意味着当前Part文件的数据被加载到了内存中。

/// Class for reading a single column (or index).
/// 在这个类创建时,就会通过CompressedReadBufferFromFile对象,将一列数据从文件读取到内存
class MergeTreeReaderStream
{
public:
    MergeTreeReaderStream(
        DiskPtr disk_,
        const String & path_prefix_,
        const String & data_file_extension_,
        size_t marks_count_,
        const MarkRanges & all_mark_ranges,
        const MergeTreeReaderSettings & settings_,
        MarkCache * mark_cache,
        UncompressedCache * uncompressed_cache,
        size_t file_size,
        const MergeTreeIndexGranularityInfo * index_granularity_info_,
        const ReadBufferFromFileBase::ProfileCallback & profile_callback,
        clockid_t clock_type);

    void seekToMark(size_t index);

    void seekToStart();
	
    ReadBuffer * data_buffer;

private:
    DiskPtr disk;
    std::string path_prefix;
    std::string data_file_extension;

    size_t marks_count;

    MarkCache * mark_cache;
    bool save_marks_in_cache;

    const MergeTreeIndexGranularityInfo * index_granularity_info;

    std::unique_ptr<CachedCompressedReadBuffer> cached_buffer;
    std::unique_ptr<CompressedReadBufferFromFile> non_cached_buffer;

    MergeTreeMarksLoader marks_loader;
};
/// Unlike CompressedReadBuffer, it can do seek.
class CompressedReadBufferFromFile : public CompressedReadBufferBase, public BufferWithOwnMemory<ReadBuffer>
{
private:
      /** At any time, one of two things is true:
      * a) size_compressed = 0
      * b)
      *  - `working_buffer` contains the entire block.
      *  - `file_in` points to the end of this block.
      *  - `size_compressed` contains the compressed size of this block.
      */
    std::unique_ptr<ReadBufferFromFileBase> p_file_in;
    ReadBufferFromFileBase & file_in;
    size_t size_compressed = 0;

    bool nextImpl() override;

public:
    CompressedReadBufferFromFile(std::unique_ptr<ReadBufferFromFileBase> buf, bool allow_different_codecs_ = false);

    CompressedReadBufferFromFile(
        const std::string & path, size_t estimated_size, size_t aio_threshold, size_t mmap_threshold,
        size_t buf_size = DBMS_DEFAULT_BUFFER_SIZE, bool allow_different_codecs_ = false);

    void seek(size_t offset_in_compressed_file, size_t offset_in_decompressed_block);

    size_t readBig(char * to, size_t n) override;

    void setProfileCallback(const ReadBufferFromFileBase::ProfileCallback & profile_callback_, clockid_t clock_type_ = CLOCK_MONOTONIC_COARSE)
    {
        file_in.setProfileCallback(profile_callback_, clock_type_);
    }
};
生成Chunk

MergeTreeIndexReader

MergeTreeSequentialSource.h中实现的generate()方法中,调用MergeTreeReaderCompact.h中实现的readRows()方法,完成数据从文件到内存的读取,最终返回一个Chunk对象。

// MergeTreeSequentialSource实际上是一个IProcessor的实现类类型,就是一个物理流水线上的最起始算子,因此我们在执行Query语句时,
// 如果数据源是本地文件,总是先执行这个算子,也就是generate()方法,产生数据。
Chunk MergeTreeSequentialSource::generate()
try
{
    const auto & header = getPort().getHeader();

    if (!isCancelled() && current_row < data_part->rows_count)
    {
        // 找到待读取数据的长度
        size_t rows_to_read = data_part->index_granularity.getMarkRows(current_mark);
        bool continue_reading = (current_mark != 0);
        // 如果读取的是Compact文件,则调用MergeTreeReaderCompact实例,尝试获取第一行的记录,就是读取多个少列
        const auto & sample = reader->getColumns();
        // 创建一个列索引数组,以保存每一列数据的地址,每一列都是连续存放的
        Columns columns(sample.size());
        size_t rows_read = reader->readRows(current_mark, continue_reading, rows_to_read, columns);
        // 如果读取了数据
        if (rows_read)
        {
            current_row += rows_read;
            current_mark += (rows_to_read == rows_read);

            bool should_evaluate_missing_defaults = false;
            reader->fillMissingColumns(columns, should_evaluate_missing_defaults, rows_read);

            if (should_evaluate_missing_defaults)
            {
                reader->evaluateMissingDefaults({}, columns);
            }

            reader->performRequiredConversions(columns);

            /// Reorder columns and fill result block.
            size_t num_columns = sample.size();
            Columns res_columns;
            res_columns.reserve(num_columns);
			// 根据SQL语句中指定的待读取字段名字,过滤掉不需要的列
            auto it = sample.begin();
            for (size_t i = 0; i < num_columns; ++i)
            {
                if (header.has(it->name))
                    res_columns.emplace_back(std::move(columns[i]));

                ++it;
            }
            // 创建一个Chunck实例,保存了N行M列数据,返回给上层调用者
            return Chunk(std::move(res_columns), rows_read);
        }
    }
    else
    {
        finish();
    }

    return {};
}

初始化聚合环境

对于定义了聚合操作的Query语句,这里包含三个聚合参数,uniq(A)、uniq(B)、uniq©,ClickHouse在生成物理计划树时,会创建一个叫AggregatingTransform的算子,它的构造函数定义如下:

class AggregatingTransform : public IProcessor
{
public:
    AggregatingTransform(Block header, AggregatingTransformParamsPtr params_);

    /// For Parallel aggregating.
    AggregatingTransform(Block header, AggregatingTransformParamsPtr params_,
                         ManyAggregatedDataPtr many_data, size_t current_variant,
                         size_t max_threads, size_t temporary_data_merge_threads);
    ~AggregatingTransform() override;
}

可以看到这个算子包含了待聚合的数据header;所有的聚合操作params_;聚合操作最大并行度max_threads等。

其中params_是一个AggregatingTransformParams结构体的指针对象,而AggregatingTransformParams包含着聚合操作中的关键一个对象Aggregator,其类定义如下:

struct AggregatingTransformParams
{
    Aggregator::Params params;
    // 所有聚合操作的细节都在这个类中定义,因此后续插入、聚合等过程,也就直接从这个类开始讲起
    Aggregator aggregator;
    bool final;

    AggregatingTransformParams(const Aggregator::Params & params_, bool final_)
        : params(params_), aggregator(params), final(final_) {}

    Block getHeader() const { return aggregator.getHeader(final); }

    Block getCustomHeader(bool final_) const { return aggregator.getHeader(final_); }
};

Aggregator类的构造:

Aggregator::Aggregator(const Params & params_)
    : params(params_),
    isCancelled([]() { return false; })
{
    /// Use query-level memory tracker
    if (auto * memory_tracker_child = CurrentThread::getMemoryTracker())
        if (auto * memory_tracker = memory_tracker_child->getParent())
            memory_usage_before_aggregation = memory_tracker->get();
    /// aggregate_functions:数组保存了所有聚合函数指针
    aggregate_functions.resize(params.aggregates_size);
    for (size_t i = 0; i < params.aggregates_size; ++i)
        aggregate_functions[i] = params.aggregates[i].function.get();

    /// Initialize sizes of aggregation states and its offsets.
    /// 每一个聚合函数都对应一个aggregate state的实例,为了更高效的存储与检索,
    /// 这些state实例,会被序列化并顺序存放在Arena的内存池中,可以认为一段连续的字节数组
    offsets_of_aggregate_states.resize(params.aggregates_size);
    total_size_of_aggregate_states = 0;
    all_aggregates_has_trivial_destructor = true;

    // aggregate_states will be aligned as below:
    // |<-- state_1 -->|<-- pad_1 -->|<-- state_2 -->|<-- pad_2 -->| .....
    //
    // pad_N will be used to match alignment requirement for each next state.
    // The address of state_1 is aligned based on maximum alignment requirements in states
    for (size_t i = 0; i < params.aggregates_size; ++i)
    {
        offsets_of_aggregate_states[i] = total_size_of_aggregate_states;
        // 统计所有的aggregate state对象的大小,仅仅包含POD成员变量的大小
        total_size_of_aggregate_states += params.aggregates[i].function->sizeOfData();

        // aggregate states are aligned based on maximum requirement
        // 记录最大的对齐长度
        align_aggregate_states = std::max(align_aggregate_states, params.aggregates[i].function->alignOfData());

        // If not the last aggregate_state, we need pad it so that next aggregate_state will be aligned.
        if (i + 1 < params.aggregates_size)
        {
            size_t alignment_of_next_state = params.aggregates[i + 1].function->alignOfData();
            if ((alignment_of_next_state & (alignment_of_next_state - 1)) != 0)
                throw Exception("Logical error: alignOfData is not 2^N", ErrorCodes::LOGICAL_ERROR);

            /// Extend total_size to next alignment requirement
            /// Add padding by rounding up 'total_size_of_aggregate_states' to be a multiplier of alignment_of_next_state.
            total_size_of_aggregate_states = (total_size_of_aggregate_states + alignment_of_next_state - 1) / alignment_of_next_state * alignment_of_next_state;
        }

        if (!params.aggregates[i].function->hasTrivialDestructor())
            all_aggregates_has_trivial_destructor = false;
    }

    method_chosen = chooseAggregationMethod();
    HashMethodContext::Settings cache_settings;
    cache_settings.max_threads = params.max_threads;
    /// 根据聚合时的key的数量以及类型,选择合适的聚合过程,例如使用two-level两阶段聚合,
    /// 或是single-level一阶段聚合
    aggregation_state_cache = AggregatedDataVariants::createCache(method_chosen, cache_settings);
}

从源数据生成中间状态

从源的数据文件中读取数据后,就调用Aggregator::execute(...)方法,在每一个源数据Block上进行每个列的聚合,这个过程是对于每一具stream流,都是单线程执行的。

void Aggregator::execute(const BlockInputStreamPtr & stream, AggregatedDataVariants & result)
{
    if (isCancelled())
        return;

    ColumnRawPtrs key_columns(params.keys_size);
    AggregateColumns aggregate_columns(params.aggregates_size);

    /** Used if there is a limit on the maximum number of rows in the aggregation,
      *  and if group_by_overflow_mode == ANY.
      * In this case, new keys are not added to the set, but aggregation is performed only by
      *  keys that have already managed to get into the set.
      */
    bool no_more_keys = false;

    LOG_TRACE(log, "Aggregating");

    Stopwatch watch;

    size_t src_rows = 0;
    size_t src_bytes = 0;

    /// Read all the data
    while (Block block = stream->read())
    {
        if (isCancelled())
            return;

        src_rows += block.rows();
        src_bytes += block.bytes();
        // 遍历Block中的所有行,并调用每一列的聚合方法,将数据插入到state对象中
        if (!executeOnBlock(block, result, key_columns, aggregate_columns, no_more_keys))
            break;
    }

    /// If there was no data, and we aggregate without keys, and we must return single row with the result of empty aggregation.
    /// To do this, we pass a block with zero rows to aggregate.
    if (result.empty() && params.keys_size == 0 && !params.empty_result_for_aggregation_by_empty_set)
        executeOnBlock(stream->getHeader(), result, key_columns, aggregate_columns, no_more_keys);

    double elapsed_seconds = watch.elapsedSeconds();
    size_t rows = result.sizeWithoutOverflowRow();

    LOG_TRACE(log, "Aggregated. {} to {} rows (from {}) in {} sec. ({} rows/sec., {}/sec.)",
        src_rows, rows, ReadableSize(src_bytes),
        elapsed_seconds, src_rows / elapsed_seconds,
        ReadableSize(src_bytes / elapsed_seconds));
}

合并中间状态

调用Aggregator::execute(...)方法进行聚合的过程,称为Inserting,每一个stream中读入的每一个Block都会生成一个中间状态集合Block。但为了能够得到最终的结果,我们还需要将所有流上的、所有中间状态Block再在某个结点上聚合,最终只产生一个Block,这个过程称为Partially Merging,过程的入口是Aggregator::mergeStream(...),代码如下:

void Aggregator::mergeStream(const BlockInputStreamPtr & stream, AggregatedDataVariants & result, size_t max_threads)
{
    if (isCancelled())
        return;

    /** If the remote servers used a two-level aggregation method,
      *  then blocks will contain information about the number of the bucket.
      * Then the calculations can be parallelized by buckets.
      * We decompose the blocks to the bucket numbers indicated in them.
      */
    BucketToBlocks bucket_to_blocks;

    /// Read all the data.
    LOG_TRACE(log, "Reading blocks of partially aggregated data.");

    size_t total_input_rows = 0;
    size_t total_input_blocks = 0;
    while (Block block = stream->read())
    {
        if (isCancelled())
            return;

        total_input_rows += block.rows();
        ++total_input_blocks;
        bucket_to_blocks[block.info.bucket_num].emplace_back(std::move(block));
    }

    LOG_TRACE(log, "Read {} blocks of partially aggregated data, total {} rows.", total_input_blocks, total_input_rows);
    /// 合并这个流中产生的所有中间状态,这个过程是可以并行的,因为每一个列上的操作都是独立的
    mergeBlocks(bucket_to_blocks, result, max_threads);
}
/// bucket_to_blocks: 是一个Map的数据结构对象。如果无端使用了two-level聚合过程,那么说明有些block可以并行处理,
/// 我们可以根据设置的max_threads数量,适当地并行化执行;否则那些不能够并行处理的Block,被放置在key为-1的位置,
// 这些Block只能单线程地处理。
void Aggregator::mergeBlocks(BucketToBlocks bucket_to_blocks, AggregatedDataVariants & result, size_t max_threads)
{
    if (bucket_to_blocks.empty())
        return;

    UInt64 total_input_rows = 0;
    for (auto & bucket : bucket_to_blocks)
        for (auto & block : bucket.second)
            total_input_rows += block.rows();

    /** `minus one` means the absence of information about the bucket
      * - in the case of single-level aggregation, as well as for blocks with "overflowing" values.
      * If there is at least one block with a bucket number greater or equal than zero, then there was a two-level aggregation.
      */
    auto max_bucket = bucket_to_blocks.rbegin()->first;
    bool has_two_level = max_bucket >= 0;

    if (has_two_level)
    {
    #define M(NAME) \
        if (method_chosen == AggregatedDataVariants::Type::NAME) \
            method_chosen = AggregatedDataVariants::Type::NAME ## _two_level;

        APPLY_FOR_VARIANTS_CONVERTIBLE_TO_TWO_LEVEL(M)

    #undef M
    }

    if (isCancelled())
        return;

    /// result will destroy the states of aggregate functions in the destructor
    result.aggregator = this;

    result.init(method_chosen);
    result.keys_size = params.keys_size;
    result.key_sizes = key_sizes;

    bool has_blocks_with_unknown_bucket = bucket_to_blocks.count(-1);

    /// First, parallel the merge for the individual buckets. Then we continue merge the data not allocated to the buckets.
    if (has_two_level)
    {
        /** In this case, no_more_keys is not supported due to the fact that
          *  from different threads it is difficult to update the general state for "other" keys (overflows).
          * That is, the keys in the end can be significantly larger than max_rows_to_group_by.
          */

        LOG_TRACE(log, "Merging partially aggregated two-level data.");

        auto merge_bucket = [&bucket_to_blocks, &result, this](Int32 bucket, Arena * aggregates_pool, ThreadGroupStatusPtr thread_group)
        {
            if (thread_group)
                CurrentThread::attachToIfDetached(thread_group);

            for (Block & block : bucket_to_blocks[bucket])
            {
                if (isCancelled())
                    return;

            #define M(NAME) \
                else if (result.type == AggregatedDataVariants::Type::NAME) \
                    mergeStreamsImpl(block, aggregates_pool, *result.NAME, result.NAME->data.impls[bucket], nullptr, false);

                if (false) {} // NOLINT
                    APPLY_FOR_VARIANTS_TWO_LEVEL(M)
            #undef M
                else
                    throw Exception("Unknown aggregated data variant.", ErrorCodes::UNKNOWN_AGGREGATED_DATA_VARIANT);
            }
        };

        std::unique_ptr<ThreadPool> thread_pool;
        if (max_threads > 1 && total_input_rows > 100000)    /// TODO Make a custom threshold.
            thread_pool = std::make_unique<ThreadPool>(max_threads);

        for (const auto & bucket_blocks : bucket_to_blocks)
        {
            const auto bucket = bucket_blocks.first;

            if (bucket == -1)
                continue;
            // 尝试并行地处理当前Block数据,每一个Block都会对应一个单独的Arema内存池,用于存放聚合后的结果
            result.aggregates_pools.push_back(std::make_shared<Arena>());
            Arena * aggregates_pool = result.aggregates_pools.back().get();

            auto task = [group = CurrentThread::getGroup(), bucket, &merge_bucket, aggregates_pool]{ return merge_bucket(bucket, aggregates_pool, group); };

            if (thread_pool)
                thread_pool->scheduleOrThrowOnError(task);
            else
                task();
        }

        if (thread_pool)
            thread_pool->wait();

        LOG_TRACE(log, "Merged partially aggregated two-level data.");
    }

    if (isCancelled())
    {
        result.invalidate();
        return;
    }
    // 如果发现有不能并行处理的Block,则采用single-level方式,单线程处理。
    if (has_blocks_with_unknown_bucket)
    {
        LOG_TRACE(log, "Merging partially aggregated single-level data.");

        bool no_more_keys = false;

        BlocksList & blocks = bucket_to_blocks[-1];
        for (Block & block : blocks)
        {
            if (isCancelled())
            {
                result.invalidate();
                return;
            }

            if (!checkLimits(result.sizeWithoutOverflowRow(), no_more_keys))
                break;

            if (result.type == AggregatedDataVariants::Type::without_key || block.info.is_overflows)
                mergeWithoutKeyStreamsImpl(block, result);

        #define M(NAME, IS_TWO_LEVEL) \
            else if (result.type == AggregatedDataVariants::Type::NAME) \
                mergeStreamsImpl(block, result.aggregates_pool, *result.NAME, result.NAME->data, result.without_key, no_more_keys);

            APPLY_FOR_AGGREGATED_VARIANTS(M)
        #undef M
            else if (result.type != AggregatedDataVariants::Type::without_key)
                throw Exception("Unknown aggregated data variant.", ErrorCodes::UNKNOWN_AGGREGATED_DATA_VARIANT);
        }

        LOG_TRACE(log, "Merged partially aggregated single-level data.");
    }
}

返回最终结果

最终,在完成了Partially Merging后,就将调用如下的方法,完成所有中间数据的聚合,最终返回一个仅包含了最终结果的一个Block。

/** Convert the aggregation data structure into a block.
      * If overflow_row = true, then aggregates for rows that are not included in max_rows_to_group_by are put in the first block.
      *
      * If final = false, then ColumnAggregateFunction is created as the aggregation columns with the state of the calculations,
      *  which can then be combined with other states (for distributed query processing).
      * If final = true, then columns with ready values are created as aggregate columns.
      */
BlocksList Aggregator::convertToBlocks(AggregatedDataVariants & data_variants, bool final, size_t max_threads) const
{
    if (isCancelled())
        return BlocksList();

    LOG_TRACE(log, "Converting aggregated data to blocks");

    Stopwatch watch;

    BlocksList blocks;

    /// In what data structure is the data aggregated?
    if (data_variants.empty())
        return blocks;

    std::unique_ptr<ThreadPool> thread_pool;
    if (max_threads > 1 && data_variants.sizeWithoutOverflowRow() > 100000  /// TODO Make a custom threshold.
        && data_variants.isTwoLevel())                      /// TODO Use the shared thread pool with the `merge` function.
        thread_pool = std::make_unique<ThreadPool>(max_threads);

    if (isCancelled())
        return BlocksList();

    if (data_variants.without_key)
        blocks.emplace_back(prepareBlockAndFillWithoutKey(
            data_variants, final, data_variants.type != AggregatedDataVariants::Type::without_key));

    if (isCancelled())
        return BlocksList();

    if (data_variants.type != AggregatedDataVariants::Type::without_key)
    {
        if (!data_variants.isTwoLevel())
            blocks.emplace_back(prepareBlockAndFillSingleLevel(data_variants, final));
        else
            blocks.splice(blocks.end(), prepareBlocksAndFillTwoLevel(data_variants, final, thread_pool.get()));
    }

    if (!final)
    {
        /// data_variants will not destroy the states of aggregate functions in the destructor.
        /// Now ColumnAggregateFunction owns the states.
        data_variants.aggregator = nullptr;
    }

    if (isCancelled())
        return BlocksList();

    size_t rows = 0;
    size_t bytes = 0;

    for (const auto & block : blocks)
    {
        rows += block.rows();
        bytes += block.bytes();
    }

    double elapsed_seconds = watch.elapsedSeconds();
    LOG_TRACE(log,
        "Converted aggregated data to blocks. {} rows, {} in {} sec. ({} rows/sec., {}/sec.)",
        rows, ReadableSize(bytes),
        elapsed_seconds, rows / elapsed_seconds,
        ReadableSize(bytes / elapsed_seconds));

    return blocks;
}