REST (Representational State Transfer) 即表述性状态传递,是 Roy Fielding 博士 2000 年在他的博士论文中提出来的一种软件架构风格。它是一种针对网络应用的设计和开发方式,可以降低开发的复杂性,提高系统的可伸缩性。
在三种主流的 Web 服务实现方案中,与复杂的 SOAP 和 XML-RPC 相比,REST 很简洁和高效,越来越多的 Web 服务开始采用 REST 风格设计和实现。
HBase 附带的 REST 服务器可以作为守护进程运行,该守护进程启动嵌入式 Jetty servlet 容器并将 Servlet 部署到其中。配置和运行 HBase 附带的 REST 服务器,可以将 HBase 的表、行、单元格和元数据公开为 URL 指定的资源。
HBase REST 相关文档:https://hbase.apache.org/book.html#_rest
HBase 的安装配置,请参考 “Springboot 系列 (24) - Springboot+HBase 大数据存储(二)| 安装配置 Apache HBase 和 Apache Zookeeper”。
HBase API 操作表的相关命令,请参考 “Springboot 系列 (25) - Springboot+HBase 大数据存储(三)| HBase Shell ”。
本文将介绍 HBase REST 的使用方式。
1. 系统环境
操作系统:Ubuntu 20.04
Java 版本:openjdk 11.0.18
Hadoop 版本:3.2.2
Zookeeper 版本:3.6.3
HBase 版本:2.4.4
HBase 所在路径:~/apps/hbase-2.4.4
本文使用的 HBase 部署在伪分布式 Hadoop 架构上(主机名:hadoop-master-vm),在 HBase + Zookeeper (独立的) 模式下运行,Zookeeper 使用端口 2182。
2. 启动 REST 服务器
$ cd ~/apps
# 前台运行,默认端口为 8080
$ ./hbase-2.4.4/bin/hbase rest start -p 8888
# 后台运行
$ ./hbase-2.4.4/bin/hbase-daemon.sh start rest -p 8888
# 显示 HBase 版本
$ curl -X GET -H "Accept: text/plain" "http://localhost:8888/version/cluster"
2.4.4
# HBase 群集状态
$ curl -X GET -H "Accept: text/plain" "http://localhost:8888/status/cluster"
1 live servers, 0 dead servers, 4.0000 average load
1 live servers
Test-Ubuntu20:16020 1679556060710
requests=16, regions=4
heapSizeMB=41
maxHeapSizeMB=494
...
# 停止 REST 服务器
$ ./hbase-2.4.4/bin/hbase-daemon.sh stop rest
注:本文使用 Ubuntu 下的 curl 作为访问 REST API 的客户端,也可以使用 wget 或 Windows 下的 Postman 等程序,不在本地运行时,可以根据实际情况,使用主机名或 IP 来替换 localhost。
示例的数据格式以 JSON 格式为主,比如以上 curl 命令,可以把 "Accept: text/plain" 改成 "Accept: application/json" 或 "Accept: text/xml",返回不同的数据格式。
3. Table 操作
# 查看所有表
$ curl -X GET -H "Accept: application/json" "http://localhost:8888"
{"table":[{"name":"test"},{"name":"user"}]}
$ curl -X GET -H "Accept: text/plain" "http://localhost:8888"
test
user
# 查看 demo 表是否存在
$ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/exists"
Not found
# 创建 demo 表 (一个列族 cf1)
$ curl -v -X PUT \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-d '{"name":"demo","ColumnSchema":[{"name":"cf1"}]}' \
"http://localhost:8888/demo/schema"
# 添加 2 个列族 cf2 和 cf3
$ curl -v -X POST \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-d '{"name":"demo","ColumnSchema":[{"name":"cf2"},{"name":"cf3"}]}' \
"http://localhost:8888/demo/schema"
# 删除 cf3 列族,不是使用 POST,应该使用 PUT 替换表结构
$ curl -v -X PUT \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-d '{"name":"demo","ColumnSchema":[{"name":"cf1"},{"name":"cf2"}]}' \
"http://localhost:8888/demo/schema"
# 修改 cf1 列族的 VERSIONS 为 2,默认是 1 (即只保留最后一个版本)
$ curl -v -X POST \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-d '{"name":"demo","ColumnSchema":[{"name":"cf1","VERSIONS":"2"}]}' \
"http://localhost:8888/demo/schema"
# 显示 demo 表结构信息
$ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/schema"
{"name":"demo","ColumnSchema":[{"name":"cf1","BLOOMFILTER":"ROW","IN_MEMORY":"false","VERSIONS":"2","KEEP_DELETED_CELLS":"FALSE","DATA_BLOCK_ENCODING":"NONE","COMPRESSION":"NONE",
"TTL":"2147483647","MIN_VERSIONS":"0","BLOCKCACHE":"true","BLOCKSIZE":"65536","REPLICATION_SCOPE":"0"},
{"name":"cf2","BLOOMFILTER":"ROW","IN_MEMORY":"false","VERSIONS":"1","KEEP_DELETED_CELLS":"FALSE","DATA_BLOCK_ENCODING":"NONE","COMPRESSION":"NONE",
"TTL":"2147483647","MIN_VERSIONS":"0","BLOCKCACHE":"true","BLOCKSIZE":"65536","REPLICATION_SCOPE":"0"}],"IS_META":"false"}
# 显示表分区
$ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/regions"
{"name":"demo","Region":[{"id":1680327485917,"startKey":"","endKey":"","location":"Test-Ubuntu20:16020","name":"demo,,1680327485917.c9ca998d01c045d8f87535ba444de7c2."}]}
4. 添加数据
以前文创建的 demo 表为例,demo 表包含 cf1,cf2 两个列族 (Column Family) ,我们将向 demo 表添加如下数据:
id | name | age | job |
row1 | Tom | 12 | Student |
row2 | Jerry | 9 | Engineer |
row3 | Jerry | 10 | Engineer |
REST 接口在操作数据时,会对 key、column、value 等值进行 Base64 编解码,可以运行如下命令编解码:
$ echo -ne "Tom" | base64 # 编码
VG9t
$ echo -ne "VG9t" | base64 -d # 解码
Tom
注:echo 的 -n 表示不换行输出,-e 表示处理特殊字符。
以下是操作数据时使用到的 Base64 编码列表:
原值 | 编码后 |
cf1:name | Y2YxOm5hbWU= |
cf1:age | Y2YxOmFnZQ== |
cf2:job | Y2YyOmpvYg== |
row1 | cm93MQ== |
row2 | cm93Mg== |
row3 | cm93Mw== |
Tom | VG9t |
Jerry | SmVycnk= |
12 | MTI= |
9 | OQ== |
10 | MTA= |
Student | U3R1ZGVudA== |
Engineer | RW5naW5lZXI= |
示例:
# 添加 row1 的 name、age 数据
$ curl -v -X PUT \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-d '{"Row":[{"key":"cm93MQ==","Cell":[{"column":"Y2YxOm5hbWU=","$":"VG9t"},{"column":"Y2YxOmFnZQ==","$":"MTI="}]}]}' \
"http://localhost:8888/demo/row1"
# 添加 row1 的 job 数据
$ curl -v -X PUT \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-d '{"Row":[{"key":"cm93MQ==","Cell":[{"column":"Y2YyOmpvYg==","$":"U3R1ZGVudA=="}]}]}' \
"http://localhost:8888/demo/row1"
# 添加 row2 的 name、age 和 job 数据
$ curl -v -X PUT \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-d '{"Row":[{"key":"cm93Mg==","Cell":[{"column":"Y2YxOm5hbWU=","$":"SmVycnk="},{"column":"Y2YxOmFnZQ==","$":"OQ=="},{"column":"Y2YyOmpvYg==","$":"RW5naW5lZXI="}]}]}' \
"http://localhost:8888/demo/row2"
# 再次添加 row2 的 age 数据, 值为 10 (Cell 支持 2 个版本,这里不会替换原版本 9,以时间戳为区分,生成一个新版本)
$ curl -v -X PUT \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-d '{"Row":[{"key":"cm93Mg==","Cell":[{"column":"Y2YxOmFnZQ==","$":"MTA="}]}]}' \
"http://localhost:8888/demo/row2"
# 添加 row3 的 name、age 和 job 数据
$ curl -v -X PUT \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-d '{"Row":[{"key":"cm93Mw==","Cell":[{"column":"Y2YxOm5hbWU=","$":"SmVycnk="},{"column":"Y2YxOmFnZQ==","$":"MTA="},{"column":"Y2YyOmpvYg==","$":"RW5naW5lZXI="}]}]}' \
"http://localhost:8888/demo/row3"
# 查看 row3 数据
$ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/row3"
{"Row":[{"key":"cm93Mw==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680424477376,"$":"MTA="},{"column":"Y2YxOm5hbWU=","timestamp":1680424477376,"$":"SmVycnk="},
{"column":"Y2YyOmpvYg==","timestamp":1680424477376,"$":"RW5naW5lZXI="}]}]}
5. 查询数据
1) GET 操作
# 获取 row2 的全部数据
$ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/row2"
{"Row":[{"key":"cm93Mg==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680416162304,"$":"MTA="},{"column":"Y2YxOm5hbWU=","timestamp":1680413754490,"$":"SmVycnk="},
{"column":"Y2YyOmpvYg==","timestamp":1680413754490,"$":"RW5naW5lZXI="}]}]}
注:key、column 和值是被 Base64 编码的,可以运行如下命令解码。
$ echo -ne "cm93Mg==" | base64 -d
row2
# 获取 row2 的 cf1 列族的全部数据
$ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/row2/cf1"
{"Row":[{"key":"cm93Mg==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680416162304,"$":"MTA="},{"column":"Y2YxOm5hbWU=","timestamp":1680413754490,"$":"SmVycnk="}]}]}
# 获取 row2 的 age 数据
$ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/row2/cf1:age"
{"Row":[{"key":"cm93Mg==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680416162304,"$":"MTA="}]}]}
注:MTA= 解码后的值是 10
# 获取 row2 的 age、job 数据 (多列)
$ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/row2/cf1:age,cf2:job"
{"Row":[{"key":"cm93Mg==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680416162304,"$":"MTA="},{"column":"Y2YyOmpvYg==","timestamp":1680413754490,"$":"RW5naW5lZXI="}]}]}t
# 获取 row2 的 age 数据的最新 2 个版本,使用参数 v (v 参数在行和列族接口也可以使用)
$ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/row2/cf1:age?v=2"
{"Row":[{"key":"cm93Mg==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680416162304,"$":"MTA="},{"column":"Y2YxOmFnZQ==","timestamp":1680416123316,"$":"OQ=="}]}]}
注:OQ== 解码后的值是 9
# 获取 row2 的 age 数据的第 1 个版本,使用 timestamp
$ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/row2/cf1:age/1680416123316"
{"Row":[{"key":"cm93Mg==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680416113142,"$":"OQ=="}]}]}
2) 无状态 Scanner
无状态 Scanner 不保存任何关于查询的状态,它把所有的查询条件作为参数进行一次性的查询。
# 扫描整个表
$ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/*"
{"Row":[{"key":"cm93MQ==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680413723937,"$":"MTA="},{"column":"Y2YxOm5hbWU=","timestamp":1680413683325,"$":"VG9t"},
{"column":"Y2YyOmpvYg==","timestamp":1680413707691,"$":"U3R1ZGVudA=="}]},
{"key":"cm93Mg==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680416162304,"$":"MTA="},{"column":"Y2YxOm5hbWU=","timestamp":1680413754490,"$":"SmVycnk="},
{"column":"Y2YyOmpvYg==","timestamp":1680413754490,"$":"RW5naW5lZXI="}]},
{"key":"cm93Mw==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680413772460,"$":"MTA="},{"column":"Y2YxOm5hbWU=","timestamp":1680413772460,"$":"SmVycnk="},
{"column":"Y2YyOmpvYg==","timestamp":1680413772460,"$":"RW5naW5lZXI="}]}]}
# 扫描 cf2 列簇
$ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/*/cf2"
{"Row":[{"key":"cm93MQ==","Cell":[{"column":"Y2YyOmpvYg==","timestamp":1680413707691,"$":"U3R1ZGVudA=="}]},
{"key":"cm93Mg==","Cell":[{"column":"Y2YyOmpvYg==","timestamp":1680413754490,"$":"RW5naW5lZXI="}]},
{"key":"cm93Mw==","Cell":[{"column":"Y2YyOmpvYg==","timestamp":1680413772460,"$":"RW5naW5lZXI="}]}]}
# 扫描 cf1 和 cf2 列簇
$ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/*/cf1,cf2"
{"Row":[{"key":"cm93MQ==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680413723937,"$":"MTA="},{"column":"Y2YxOm5hbWU=","timestamp":1680413683325,"$":"VG9t"},
{"column":"Y2YyOmpvYg==","timestamp":1680413707691,"$":"U3R1ZGVudA=="}]},
{"key":"cm93Mg==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680416162304,"$":"MTA="},{"column":"Y2YxOm5hbWU=","timestamp":1680413754490,"$":"SmVycnk="},
{"column":"Y2YyOmpvYg==","timestamp":1680413754490,"$":"RW5naW5lZXI="}]},
{"key":"cm93Mw==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680413772460,"$":"MTA="},{"column":"Y2YxOm5hbWU=","timestamp":1680413772460,"$":"SmVycnk="},
{"column":"Y2YyOmpvYg==","timestamp":1680413772460,"$":"RW5naW5lZXI="}]}]}
# 扫描 cf1 列簇的 age,显示最后两个版本
$ curl -X GET -X GET -H "Accept: application/json" "http://localhost:8888/demo/*/cf1:age?v=2"
{"Row":[{"key":"cm93MQ==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680413723937,"$":"MTA="},{"column":"Y2YxOmFnZQ==","timestamp":1680413683325,"$":"MTI="}]},
{"key":"cm93Mg==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680416162304,"$":"MTA="},{"column":"Y2YxOmFnZQ==","timestamp":1680416123316,"$":"OQ=="}]},
{"key":"cm93Mw==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680413772460,"$":"MTA="}]}]}
# 扫描整个表,限定返回行数
$ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/*?limit=1"
{"Row":[{"key":"cm93MQ==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680413723937,"$":"MTA="},{"column":"Y2YxOm5hbWU=","timestamp":1680413683325,"$":"VG9t"},
{"column":"Y2YyOmpvYg==","timestamp":1680413707691,"$":"U3R1ZGVudA=="}]}]}
# 扫描整个表,从指定行(包括该行)开始向后扫描
$ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/*?startrow=row2"
{"Row":[{"key":"cm93Mg==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680416162304,"$":"MTA="},{"column":"Y2YxOm5hbWU=","timestamp":1680413754490,"$":"SmVycnk="},
{"column":"Y2YyOmpvYg==","timestamp":1680413754490,"$":"RW5naW5lZXI="}]},
{"key":"cm93Mw==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680413772460,"$":"MTA="},{"column":"Y2YxOm5hbWU=","timestamp":1680413772460,"$":"SmVycnk="},
{"column":"Y2YyOmpvYg==","timestamp":1680413772460,"$":"RW5naW5lZXI="}]}]}
# 扫描整个表,扫描到指定行(不包括该行)
$ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/*?endrow=row2"
{"Row":[{"key":"cm93MQ==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680413723937,"$":"MTA="},{"column":"Y2YxOm5hbWU=","timestamp":1680413683325,"$":"VG9t"},
{"column":"Y2YyOmpvYg==","timestamp":1680413707691,"$":"U3R1ZGVudA=="}]}]}
# 扫描整个表,复合条件
$ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/*/cf1:age?v=2&limit=1&startrow=row2"
{"Row":[{"key":"cm93MQ==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680413723937,"$":"MTA="},{"column":"Y2YxOmFnZQ==","timestamp":1680413683325,"$":"MTI="}]},
{"key":"cm93Mg==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680416162304,"$":"MTA="},{"column":"Y2YxOmFnZQ==","timestamp":1680416123316,"$":"OQ=="}]},
{"key":"cm93Mw==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680413772460,"$":"MTA="}]}]}
3) 有状态 Scanner
略
6. 删除数据
# 删除 row3 的 cf2 列族的 job 数据
$ curl -v -X DELETE -H "Accept: application/json" -H "Content-Type: application/json" "http://localhost:8888/demo/row3/cf2:job/?check=delete"
# 查看 row3 数据
$ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/row3"
{"Row":[{"key":"cm93Mw==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680423950256,"$":"MTA="},{"column":"Y2YxOm5hbWU=","timestamp":1680423950256,"$":"SmVycnk="}]}]}
# 删除 row3 的 cf1 列族,"cf1" 的 Base64 编码值是 "Y2Yx"
$ curl -v -X DELETE -H "Accept: application/json" -H "Content-Type: application/json" "http://localhost:8888/demo/row3/cf1/?check=delete"
# 查看 row3 数据
$ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/row3"
Not found
# 重新添加 row3 数据,删除 row3 整行数据
$ curl -v -X DELETE -H "Accept: application/json" -H "Content-Type: application/json" "http://localhost:8888/demo/row3/?check=delete"
# 查看 row3 数据
$ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/row3"
Not found
# 查看 row2 的 age 数据的最新 2 个版本
$ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/row2/cf1:age?v=2"
{"Row":[{"key":"cm93Mg==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680416162304,"$":"MTA="},{"column":"Y2YxOmFnZQ==","timestamp":1680416123316,"$":"OQ=="}]}]}
# 删除 row2 的 age 数据的第一个版本,时间戳是 1680416123316
$ curl -v -X DELETE -H "Accept: application/json" -H "Content-Type: application/json" "http://localhost:8888/demo/row2/cf1:age/1680416123316/?check=delete"
# 再查看 row2 的 age 数据的最新 2 个版本
$ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/row2/cf1:age?v=2"
{"Row":[{"key":"cm93Mg==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680416162304,"$":"MTA="}]}]}
# 删除表
$ curl -v -X DELETE -H "Accept: application/json" "http://localhost:8888/demo/schema"
7. Namespace 操作
# 创建 Namespace
$ curl -v -X POST -H "Accept: application/json" "http://localhost:8888/namespaces/ns_test"
# 查看所有 Namespace
$ curl -X GET -H "Accept: application/json" "http://localhost:8888/namespaces"
{"Namespace":["default","hbase","ns_test"]}
# 在 ns_test 下创建 tbl_01 表,一个列族 cf1
$ curl -v -X PUT \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-d '{"name":"ns_test:tbl_01","ColumnSchema":[{"name":"cf1"}]}' \
"http://localhost:8888/ns_test:tbl_01/schema"
# 查看 ns_test 下的表
$ curl -X GET -H "Accept: application/json" "http://localhost:8888/namespaces/ns_test/tables"
{"table":[{"name":"tbl_01"}]}
# 无法删除有表的 Namespace,需要先删除表 ns_test:tbl_01 后,再删除 ns_test
$ curl -v -X DELETE -H "Accept: application/json" "http://localhost:8888/ns_test:tbl_01/schema"
$ curl -v -X DELETE -H "Accept: application/json" "http://localhost:8888/namespaces/ns_test"
# 查看所有 Namespace
$ curl -X GET -H "Accept: application/json" "http://localhost:8888/namespaces"
{"Namespace":["SYSTEM","default","hbase"]}