hbase solrer二级索引 hbase二级索引phoenix

转载

coolfengsy 2023-07-20 23:13:04

文章标签 hbase solrer二级索引 hbase 大数据 hadoop 字段 文章分类 Hbase 数据库

文章目录

二级索引配置文件
全局索引（global index）
包含索引（covered index）
本地索引（local index）

二级索引配置文件

添加如下配置到 HBase 的 HRegionserver 节点的 hbase-site.xml。

<!-- phoenix regionserver 配置参数-->
<property>
 	<name>hbase.regionserver.wal.codec</name>
  	<value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
</property>

全局索引（global index）

Global Index 是默认的索引格式，创建全局索引时，会在 HBase 中建立一张新表。也就是说索引数据和数据表是存放在不同的表中的，因此全局索引适用于多读少写的业务场景。

写数据的时候会消耗大量开销，因为索引表也要更新，而索引表是分布在不同的数据节点上的，跨节点的数据传输带来了较大的性能消耗。在读数据的时候 Phoenix 会选择索引表来降低查询消耗的时间。

创建单个字段的全局索引。

CREATE INDEX my_index ON my_table (my_col);

#例如
create index my_index on student1(age);

#删除索引
DROP INDEX my_index ON my_table
#例如
drop index my_index on student1;

查看二级索引是否有效，可以使用 explainPlan 执行计划，有二级索引之后会变成范围扫描

explain select id,name from student1 where age = 10;

hbase solrer二级索引 hbase二级索引phoenix_大数据

如果想查询的字段不是索引字段的话索引表不会被使用，也就是说不会带来查询速度的提升。

例如：

explain select id,name,addr from student1 where age = 10;

hbase solrer二级索引 hbase二级索引phoenix_hbase_02

若想解决上述问题，可采用如下方案：
（1）使用包含索引
（2）使用本地索引

包含索引（covered index）

创建携带其他字段的全局索引（本质还是全局索引）。

CREATE INDEX my_index ON my_table (v1) INCLUDE (v2);

先删除之前的索引：

drop index my_index on student1;

#创建包含索引
create index my_index on student1(age) include (addr);

之后使用执行计划查看效果

explain select 
id,name,addr from student1 where age = 10;

hbase solrer二级索引 hbase二级索引phoenix_hbase solrer二级索引_03

本地索引（local index）

Local Index 适用于写操作频繁的场景。
索引数据和数据表的数据是存放在同一张表中（且是同一个 Region），避免了在写操作的时候往不同服务器的索引表中写索引带来的额外开销。

my_column 可以是多个。

CREATE LOCAL INDEX my_index ON my_table (my_column);

本地索引会将所有的信息存在一个影子列族中，虽然读取的时候也是范围扫描，但是没有全局索引快，优点在于不用写多个表了。

#删除之前的索引
drop index my_index on student1;

#创建本地索引
CREATE LOCAL INDEX my_index ON student1 (age,addr);

#使用执行计划
explain select id,name,addr from student1 where age = 10;

hbase solrer二级索引 hbase二级索引phoenix_hbase_04