Phoenix(sql on hbase)简单介绍



介绍:

Phoenix is a SQL skin over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows.



部署:



1:wget http://phoenix-bin.github.com/client/phoenix-2.2.1-install.tar,将jar包拷贝至HBASE_HOME/lib就可以


2:运行psql.sh localhost ../examples/web_stat.sql ../examples/web_stat.csv ../examples/web_stat_queries.sql,载入演示样例数据


3:sqlline.sh localhost(zookeeper地址)进入命令行client




二级索引相关(索引的使用须要调用Phoenix API):

二级索引(多列时)使用须要在hbase-site.xml中增加例如以下配置



<property>


  <name>hbase.regionserver.wal.codec</name>


  <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>


</property>



创建索引样例:



create table usertable (id varchar primary key, firstname varchar, lastname varchar);


create index idx_name on usertable (firstname) include (lastname);


 


可通过例如以下方法查看当前SQL语句索引是否生效:


explain select id, firstname, lastname from usertable where firstname = 'foo';



explain的相关说明:



RANGE SCAN means that only a subset of the rows in your table will be scanned over. This occurs if you use one or more leading columns from your primary key constraint. Query that is not filtering on leading PK columns ex. select * from test where pk2='x' and pk3='y'; will result in full scan whereas the following query will result in range scan select * from test where pk1='x' and pk2='y';. Note that you can add a secondary index on your "pk2" and "pk3" columns and that would cause a range scan to be done for the first query (over the index table).


 


DEGENERATE SCAN means that a query can't possibly return any rows. If we can determine that at compile time, then we don't bother to even run the scan.


 


FULL SCAN means that all rows of the table will be scanned over (potentially with a filter applied if you have a WHERE clause)


 


SKIP SCAN means that either a subset or all rows in your table will be scanned over, however it will skip large groups of rows depending on the conditions in your filter. See this blog for more detail. We don't do a SKIP SCAN if you have no filter on the leading primary key columns, but you can force a SKIP SCAN by using the /*+ SKIP_SCAN */ hint. Under some conditions, namely when the cardinality of your leading primary key columns is low, it will be more efficient than a FULL SCAN.



索引使用介绍:

  • 主键索引:主键索引要按创建时的顺序引用。如primary key(id,name,add),那么会隐式的创建(id),(id,name),(id,name,add)三个索引,假设在where中用这三个条件会用到索引,其它组合则无法使用索引(FULL SCAN)。
  • 二级索引:除了要按创建时的顺序引用外,假设查询的列不全在索引或者覆盖索引中则无法使用索引。

    举例:
    DDL:create table usertable (id varchar primary key, firstname varchar, lastname varchar);
         create index idx_name on usertable (firstname);
    DML:select id, firstname, lastname from usertable where firstname = 'foo';
    此查询不会使用到索引,由于lastname不再索引中。
     
    运行DDL:create idx_name on usertable (firstname) include (lastname)后该查询语句才干使用索引。
     
    遗留问题:include和on在Phoenix中详细有什么差别?
  • 查询条件中主键索引+二级索引同一时候存在的话,Phoenix会自己选择最优索引。



Phoenix的SQL表结构与Hbase结构的映射实验



>>create table user3table (id varchar, firstname varchar, lastname varchar CONSTRAINT PK PRIMARY KEY (id,firstname));


 


>>!describe user3table


 


+------------+-------------+------------+-------------+-----------+------------+-------------+---------------+----------------+----------------+------+


| TABLE_CAT  | TABLE_SCHEM | TABLE_NAME | COLUMN_NAME | DATA_TYPE | TYPE_NAME  | COLUMN_SIZE | BUFFER_LENGTH | DECIMAL_DIGITS | NUM_PREC_RADIX | NULL |


+------------+-------------+------------+-------------+-----------+------------+-------------+---------------+----------------+----------------+------+


| null       | null        | USER3TABLE | ID          | 12        | VARCHAR    | null        | null          | null           | null           | 1    |


| null       | null        | USER3TABLE | FIRSTNAME   | 12        | VARCHAR    | null        | null          | null           | null           | 1    |


| _0         | null        | USER3TABLE | LASTNAME    | 12        | VARCHAR    | null        | null          | null           | null           | 1    |


+------------+-------------+------------+-------------+-----------+------------+-------------+---------------+----------------+----------------+------+


 


>>!index user3table;


+-----------+-------------+------------+------------+-----------------+------------+------+------------------+-------------+-------------+------------+


| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | NON_UNIQUE | INDEX_QUALIFIER | INDEX_NAME | TYPE | ORDINAL_POSITION | COLUMN_NAME | ASC_OR_DESC | CARDINALIT |


+-----------+-------------+------------+------------+-----------------+------------+------+------------------+-------------+-------------+------------+


+-----------+-------------+------------+------------+-----------------+------------+------+------------------+-------------+-------------+------------+


 


>>select * from user3table;


+------------+------------+------------+


|     ID     | FIRSTNAME  |  LASTNAME  |


+------------+------------+------------+


| hup        | zhan       | feng       |


+------------+------------+------------+


 


>>hbase>>scan 'USER3TABLE'


 


ROW                                    COLUMN+CELL                                                                                                    


 hup\x00zhan                           column=_0:LASTNAME, timestamp=1387875092585, value=feng                                                        


 hup\x00zhan                           column=_0:_0, timestamp=1387875092585, value=


1 row(s) in 0.0110 seconds


 


>>create index idx_test on user3table (firstname) include (lastname);


 


>>!index user3table;


+-----------+-------------+------------+------------+-----------------+------------+------+------------------+-------------+-------------+------------+


| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | NON_UNIQUE | INDEX_QUALIFIER | INDEX_NAME | TYPE | ORDINAL_POSITION | COLUMN_NAME | ASC_OR_DESC | CARDINALIT |


+-----------+-------------+------------+------------+-----------------+------------+------+------------------+-------------+-------------+------------+


| null      | null        | USER3TABLE | true       | null            | IDX_TEST   | 3    | 1                | :FIRSTNAME  | A           | null       |


| null      | null        | USER3TABLE | true       | null            | IDX_TEST   | 3    | 2                | :ID         | A           | null       |


| null      | null        | USER3TABLE | true       | null            | IDX_TEST   | 3    | 3                | _0:LASTNAME | null        | null       |


+-----------+-------------+------------+------------+-----------------+------------+------+------------------+-------------+-------------+------------+


 


>>select * from user3table;


+------------+------------+------------+


|     ID     | FIRSTNAME  |  LASTNAME  |


+------------+------------+------------+


| hup        | zhan       | feng       |


+------------+------------+------------+


 


>>hbase>>scan 'USER3TABLE'


 


ROW                                    COLUMN+CELL                                                                                                    


 hup\x00zhan                           column=_0:LASTNAME, timestamp=1387875092585, value=feng                                                        


 hup\x00zhan                           column=_0:_0, timestamp=1387875092585, value=


1 row(s) in 0.0110 seconds


 


此外:当表中非主键的列有多个时会统一加后缀:


 


 1:NASalesforce.com\x00Login\x00\x00\x00 column=STATS:ACTIVE_VISITOR, timestamp=1387867968156, value=\x80\x00\x1A"                                      


 \x01C%\x17\xFE0                                                                                                                                      


 2:NASalesforce.com\x00Login\x00\x00\x00 column=USAGE:CORE, timestamp=1387867968156, value=\x80\x00\x00\x00\x00\x00\x00\xC9                             


 \x01C%\x17\xFE0                                                                                                                                      


 3:NASalesforce.com\x00Login\x00\x00\x00 column=USAGE:DB, timestamp=1387867968156, value=\x80\x00\x00\x00\x00\x00\x02\x84                               


 \x01C%\x17\xFE0                                                                                                                                      


 4:NASalesforce.com\x00Login\x00\x00\x00 column=USAGE:_0, timestamp=1387867968156, value=                                                               


\x01C%\x17\xFE0


结论:

1:Phoenix会把“CONSTRAINT PK PRIMARY KEY (id,firstname)”这样定义的列拼起来增加到Hbase主键中(用\x00进行切割),同一时候将联合主键涉及到的列合并默认名为"_0"的一列。其值为空。其它列放入Hbase的同名列中存储

2:Phoenix在Hbase中维护了一张系统表(SYSTEM TABLE)来存储相关Phoenix表的scheme元数据。

3:创建二级索引(create index)操作不会影响表结构

4:假设建表时不指定列族,则列族以_0、_1的方式命名

5:假设有多列时value值通过HBase接口获取的并非直接可用的值(仅仅能通过Phoenix接口获取正常值)



动态scheme相关

1:支持改动列



Example:


 


ALTER TABLE my_schema.my_table ADD d.dept_id char(10) VERSIONS=10


ALTER TABLE my_table ADD dept_name char(50)


ALTER TABLE my_table ADD parent_id char(15) null primary key


ALTER TABLE my_table DROP COLUMN d.dept_id


ALTER TABLE my_table DROP COLUMN dept_name


ALTER TABLE my_table DROP COLUMN parent_id


ALTER TABLE my_table SET IMMUTABLE_ROWS=true



2:支持改动二级索引



Example:


 


CREATE INDEX my_idx ON sales.opportunity(last_updated_date DESC)


CREATE INDEX my_idx ON log.event(created_date DESC) INCLUDE (name, payload) SALT_BUCKETS=10


CREATE INDEX IF NOT EXISTS my_comp_idx ON server_metrics ( gc_time DESC, created_date DESC )


    DATA_BLOCK_ENCODING='NONE',VERSIONS=?,MAX_FILESIZE=2000000 split on (?, ?, ?)


 


ALTER INDEX my_idx ON sales.opportunity DISABLE


ALTER INDEX IF EXISTS my_idx ON server_metrics REBUILD


 


DROP INDEX my_idx ON sales.opportunity


DROP INDEX IF EXISTS my_idx ON server_metrics



3:应该是不支持改动主键索引(没找到相关信息。理论上也不好支持,由于主键索引就是rowkey的值。)



Javaclient演示样例代码(直接面向JDBC接口编程):



import java.sql.Connection;


import java.sql.DriverManager;


import java.sql.ResultSet;


import java.sql.SQLException;


import java.sql.PreparedStatement;


import java.sql.Statement;


 


public class test {


 


    public static void main(String[] args) throws SQLException {


        Statement stmt = null;


        ResultSet rset = null;


 


        Connection con = DriverManager.getConnection("jdbc:phoenix:zookeeper");


        stmt = con.createStatement();


 


        stmt.executeUpdate("create table test (mykey integer not null primary key, mycolumn varchar)");


        stmt.executeUpdate("upsert into test values (1,'Hello')");


        stmt.executeUpdate("upsert into test values (2,'World!')");


        con.commit();


 


        PreparedStatement statement = con.prepareStatement("select * from test");


        rset = statement.executeQuery();


        while (rset.next()) {


            System.out.println(rset.getString("mycolumn"));


        }


        statement.close();


        con.close();


    }


}



单节点測试:



建表:


CREATE TABLE IF NOT EXISTS $table (HOST CHAR(2) NOT NULL,DOMAIN VARCHAR NOT NULL,


FEATURE VARCHAR NOT NULL,DATE DATE NOT NULL,USAGE.CORE BIGINT,USAGE.DB BIGINT,STATS.ACTIVE_VISITOR


INTEGER CONSTRAINT PK PRIMARY KEY (HOST, DOMAIN, FEATURE, DATE))


SPLIT ON ('CSGoogle','CSSalesforce','EUApple','EUGoogle','EUSalesforce','NAApple','NAGoogle','NASalesforce');


 


performance_10000000数据表中是1000W条例如以下数据:


 


+------+------------+------------+---------------------+----------+----------+----------------+


| HOST |   DOMAIN   |  FEATURE   |        DATE         |   CORE   |    DB    | ACTIVE_VISITOR |


+------+------------+------------+---------------------+----------+----------+----------------+


| CS   | Apple.com  | Dashboard  | 2013-12-23          | 363      | 795      | 8390           |


+------+------------+------------+---------------------+----------+----------+----------------+


 


Query # 1 - Count - SELECT COUNT(1) FROM performance_10000000;


 


COUNT(1)


--------


10000000


Time: 66.044 sec(s)


 


Query # 2 - Group By First PK - SELECT HOST FROM performance_10000000 GROUP BY HOST;


 


HOST


----


CS  


EU  


NA  


Time: 51.43 sec(s)


 


Query # 3 - Group By Second PK - SELECT DOMAIN FROM performance_10000000 GROUP BY DOMAIN;


 


DOMAIN    


----------


Apple.com 


Google.com


Salesforce.com


Time: 46.908 sec(s)


 


Query # 4 - Truncate + Group By - SELECT TRUNC(DATE,'DAY') DAY FROM performance_10000000 GROUP BY TRUNC(DATE,'DAY');


 


DAY                


-------------------


2013-12-23 00:00:00


2013-12-24 00:00:00


2013-12-25 00:00:00


......


Time: 48.132 sec(s)


 


Query # 5 - Filter + Count - SELECT COUNT(1) FROM performance_10000000 WHERE CORE<10;


 


COUNT(1)


--------


  198669


Time: 31.301 sec(s)


 



集群(5*RegionServer)測试:



performance_10000000数据表中是1000W条例如以下数据:


 


+------+------------+------------+---------------------+----------+----------+----------------+


| HOST |   DOMAIN   |  FEATURE   |        DATE         |   CORE   |    DB    | ACTIVE_VISITOR |


+------+------------+------------+---------------------+----------+----------+----------------+


| CS   | Apple.com  | Dashboard  | 2013-12-23          | 363      | 795      | 8390           |


+------+------------+------------+---------------------+----------+----------+----------------+


 


Query # 1 - Count - SELECT COUNT(1) FROM performance_10000000;


 


COUNT(1)


--------


19630614


Time: 13.879 sec(s)


 


Query # 2 - Group By First PK - SELECT HOST FROM performance_10000000 GROUP BY HOST;


 


HOST


----


CS  


EU  


NA  


Time: 13.545 sec(s)


 


Query # 3 - Group By Second PK - SELECT DOMAIN FROM performance_10000000 GROUP BY DOMAIN;


 


DOMAIN    


----------


Apple.com 


Google.com


Salesforce.com


Time: 12.907 sec(s)


 


Query # 4 - Truncate + Group By - SELECT TRUNC(DATE,'DAY') DAY FROM performance_10000000 GROUP BY TRUNC(DATE,'DAY');


 


DAY                


-------------------


2013-12-23 00:00:00


2013-12-24 00:00:00


2013-12-25 00:00:00


......


Time: 13.845 sec(s)


 


Query # 5 - Filter + Count - SELECT COUNT(1) FROM performance_10000000 WHERE CORE<10;


 


COUNT(1)


--------


  393154


Time: 8.522 sec(s)


 



长处:

1:命令行和javaclient使用都非常easy。尤其是javaclient直接面向JDBC接口编程,封装且优化了Hbase非常多细节。

2:在单表操作上性能比Hive Handler好非常多(可是handler也有可能会升级增加斜处理器相关聚合等特性)

3:支持多列的二级索引,列数不限。当中可变索引时列数越多写入速度越慢,不可变索引不影响写入速度(​

4:对Top-N查询速度远超Hive​

5:提供对rowkey分桶的特性,能够实现数据在各个region的均匀分布​​​

6:低侵入性,基本对原Hbase的使用没什么影响

7:提供的函数基本都能cover住绝大多数需求了

8:与Hive不同的是,Phoenix的sql语句更接近标准sql规范。



缺点:

1:Phoenix创建的表Hbase能够识别并使用,可是使用Hbase创建的表,Phoenix不能识别,由于Phoenix对每张表都有其对应的元数据信息。

2:硬伤:多表join操作当前不支持(官方文档对当前2.2.3版本号的说法不一致,但3.0应该会支持,有可能会引入Apache Drill把大表join切割成小任务的特性)。

3:眼下仅仅支持hbase0.94系列(这个应该问题不大)



其它

1:Phoenix对全部数据都是基于内存进行统计。因此从sql语句复杂程度以及java各数据结构的性能,基本能对其消耗的时间有一个大概的预计。



功能扩展(impala不太熟,主要针对Hive说了)

假设将Hive和Phoenix集成的话,还是非常不错的,两者刚好互补。Hive并没使用coprocesser,仅仅是通过把数据取出来做MR,而Phoenix刚好是在单表取数据方面做了非常多优化。集成后能够享受到Phoenix的单表操作优势,同一时候能够解决多表join的问题(在Phoenix预计短时间难做出来大表join的方案,说是要模仿Drill,可是如今Drill本身救处于Alpha阶段)。

假设集成的话主要工作是须要在Hive-hbase-handler中适配Hive相关单表操作到Phoenix的javaclient接口。

不太成熟的问题:

1:是把Phoenix的单表操作移植到Hive中还是把Hive的join移植到Phoenix中?

2:是仅仅对外提供Hive的接口还是同一时候对外提供Hive和Phoenix两种接口呢?

3:适配的过程还有非常多细节

尝试调研了下Phoenix二级索引能否够达到像华为一样创建完能够无需改动HBase不论什么代码就享受到二级索引的效果





对照华为HBase二级索引:

缺点:华为二级索引须要在建表时指定列(及不支持动态改动),同一时候华为代码对Hbase本身侵入性太大(比方balancer要用华为的),难以升级维护。

长处:可是索引建好后,在对Hbase的scan、Puts、Deletes操作时使用Hbase原生代码(无需不论什么修改)就可以享受到索引的效果。也不须要指定使用哪个索引,它会自己使用最优索引。


​也就是说假设加上华为索引,Hive-hbase-handler无需修改就可以使用二级索引。可是phoenix眼下仅仅支持通过phoenix sql方式使用二级索引。​


性能对照:暂未測试,预计差不太多

综合看移植phoenix比移植华为更靠谱,phoenix侵入性小,功能更强大,且升级维护方面也比华为要靠谱。可是移植phoenix难度也相对照较大。

可是假设仅仅是想短期起效果,能够尝试下华为索引。



淘宝开源项目Lealone:

是一个可用于HBase的分布式SQL引擎,主要功能就是能用SQL方式(JDBC)查询Hbase,避免了HBase使用的繁琐操作。相对与Phoenix的功能弱多了。

  • 支持高性能的分布式事务,
  • 使用一个很新颖的基于局部时间戳的多版本号冲突与有效性检測的分布式事务模型
  • 是对H2关系数据库SQL引擎的改进和扩展
  • HBase建的表Lealone仅仅能读;Lealone建的表Lealone能够读写。



基于Solr的HBase多条件查询:

介绍:ApacheSolr 是一个开源的搜索server,Solr 使用 Java 语言开发,主要基于 HTTP 和Apache Lucene 实现。

原理:基于Solr的HBase多条件查询原理非常easy,将HBase表中涉及条件过滤的字段和rowkey在Solr中建立索引,通过Solr的多条件查询高速获得符合过滤条件的rowkey值,拿到这些rowkey之后在HBASE中通过指定rowkey进行查询。

缺点:

1:ApacheSolr本身并非专为HBase设计的。须要专门针对ApacheSolr写Hbase的相关应用,比方HBase写数据时同步更新索引的过程须要我们自己写协处理器。

2:ApacheSolr本身是一个WebService服务,须要额外维护一个或多个ApacheSolrserver。