文章目录
- 一、HBase连接的方式概况
- 二、Java
- 1.HBase老版本:
- (1)建表:
- (2)删除表:
- (3)写入数据:
- (4)查询:
- (5)通过Java Api与HBase交互的一些常用的操作整合:
- 2.HBase 2版本:
- (1)连接HBase:
- (2)创建HBase的表:
- (3)HBase表添加数据:
- (4)删除HBase的列簇或列:
- (5)更新HBase表的列:
- (6)HBase查询:
- (7)快速测试hbase连通性Demo(查询所有表名):
- 三、Scala
- 1.读写hbase+sparksql查询hbase的表数据:
- 2.sparkstreaming读取kafka的数据再写入hbase:
一、HBase连接的方式概况
主要分为:
- 纯Java API读写HBase的方式;
- Spark读写HBase的方式;
- Flink读写HBase的方式;
- HBase通过Phoenix读写的方式;
第一种方式是HBase自身提供的比较原始的高效操作方式,而第二、第三则分别是Spark、Flink集成HBase的方式,最后一种是第三方插件Phoenix集成的JDBC方式,Phoenix集成的JDBC操作方式也能在Spark、Flink中调用。
二、Java
1.HBase老版本:
(1)建表:
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.HBaseAdmin;
public class CreateTableTest {
public static void main(String[] args) throws IOException {
//设置HBase数据库的连接配置参数
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "192.168.8.71"); // Zookeeper的地址
conf.set("hbase.zookeeper.property.clientPort", "2181");
String tableName = "emp";
String[] family = { "basicinfo","deptinfo"};
HBaseAdmin hbaseAdmin = new HBaseAdmin(conf);
//创建表对象
HTableDescriptor hbaseTableDesc = new HTableDescriptor(TableName.valueOf(tableName));
for(int i = 0; i < family.length; i++) {
//设置表字段
hbaseTableDesc.addFamily(new HColumnDescriptor(family[i]));
}
//判断表是否存在,不存在则创建,存在则打印提示信息
if(hbaseAdmin.tableExists(TableName.valueOf(tableName))) {
System.out.println("TableExists!");
/**
这个方法是用来结束当前正在运行中的java虚拟机。如何status是非零参数,那么表示是非正常退出。
System.exit(0)是将你的整个虚拟机里的内容都停掉了 ,而dispose()只是关闭这个窗口,但是并没有停止整个application exit() 。无论如何,内存都释放了!也就是说连JVM都关闭了,内存里根本不可能还有什么东西
System.exit(0)是正常退出程序,而System.exit(1)或者说非0表示非正常退出程序
System.exit(status)不管status为何值都会退出程序。和return 相比有以下不同点: return是回到上一层,而System.exit(status)是回到最上层
*/
System.exit(0);
} else{
hbaseAdmin.createTable(hbaseTableDesc);
System.out.println("Create table Success!");
}
}
}(2)删除表:
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.HBaseAdmin;
public class DeleteMyTable {
public static void main(String[] args) throws IOException {
String tableName = "mytb";
delete(tableName);
}
public static Configuration getConfiguration() {
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.rootdir", "hdfs://192.168.8.71:9000/hbase");
conf.set("hbase.zookeeper.quorum", "192.168.8.71");
return conf;
}
public static void delete(String tableName) throws IOException {
HBaseAdmin hAdmin = new HBaseAdmin(getConfiguration());
if(hAdmin.tableExists(tableName)){
try {
hAdmin.disableTable(tableName);
hAdmin.deleteTable(tableName);
System.err.println("Delete table Success");
} catch (IOException e) {
System.err.println("Delete table Failed ");
}
}else{
System.err.println("table not exists");
}
}
}(3)写入数据:
某电商网站,后台有买家信息表buyer,每注册一名新用户网站后台会产生一条日志,并写入hbase中。
数据格式为:用户ID(buyer_id),注册日期(reg_date),注册IP(reg_ip),卖家状态(buyer_status,0表示冻结 ,1表示正常),以“\t”分割,数据内容如下:
用户ID 注册日期 注册IP 卖家状态
20385,2010-05-04,124.64.242.30,1
20386,2010-05-05,117.136.0.172,1
20387,2010-05-06 ,114.94.44.230,1import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.MasterNotRunningException;
import org.apache.hadoop.hbase.ZooKeeperConnectionException;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.util.Bytes;
public class PutData {
public static void main(String[] args) throws MasterNotRunningException,
ZooKeeperConnectionException, IOException {
String tableName = "mytb";
String columnFamily = "mycf";
put(tableName, "20385", columnFamily, "2010-05-04:reg_ip", "124.64.242.30");
put(tableName, "20385", columnFamily, "2010-05-04:buyer_status", "1");
put(tableName, "20386", columnFamily, "2010-05-05:reg_ip", "117.136.0.172");
put(tableName, "20386", columnFamily, "2010-05-05:buyer_status", "1");
put(tableName, "20387", columnFamily, "2010-05-06:reg_ip", "114.94.44.230");
put(tableName, "20387", columnFamily, "2010-05-06:buyer_status", "1");
}
public static Configuration getConfiguration() {
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.rootdir", "hdfs://192.168.8.71:9000/hbase");
conf.set("hbase.zookeeper.quorum", "192.168.8.71");
return conf;
}
public static void put(String tableName, String row, String columnFamily,
String column, String data) throws IOException {
HTable table = new HTable(getConfiguration(), tableName);
Put put = new Put(Bytes.toBytes(row));
put.add(Bytes.toBytes(columnFamily),
Bytes.toBytes(column),
Bytes.toBytes(data));
table.put(put);
System.err.println("SUCCESS");
}
}注意:手动构建 HTable 已被弃用。请使用 连接 来实例化表 。通过连接,可以使用 Connection.getTable(TableName)
(4)查询:
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;
public class GetData {
public static void main(String[] args) throws IOException {
String tableName = "mytb";
get(tableName, "20386");
}
public static Configuration getConfiguration() {
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.rootdir", "hdfs://192.168.8.71:9000/hbase");
conf.set("hbase.zookeeper.quorum", "192.168.8.71");
return conf;
}
public static void get(String tableName, String rowkey) throws IOException {
HTable table = new HTable(getConfiguration(), tableName);
Get get = new Get(Bytes.toBytes(rowkey));
Result result = table.get(get);
byte[] value1 = result.getValue("mycf".getBytes(), "2010-05-05:reg_ip".getBytes());
byte[] value2 = result.getValue("mycf".getBytes(), "2010-05-05:buyer_status".getBytes());
System.err.println("line1:SUCCESS");
System.err.println("line2:"
+ new String(value1) + "\t"
+ new String(value2));
}
}前面的这些代码都这样执行:
[hadoop@h71 q1]$ /usr/jdk1.7.0_25/bin/javac GetData.java
[hadoop@h71 q1]$ /usr/jdk1.7.0_25/bin/java GetData(5)通过Java Api与HBase交互的一些常用的操作整合:
import java.io.IOException;
import java.util.Iterator;
import java.util.List;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Delete;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;
public class HBaseTest2 {
// 声明静态配置
static Configuration conf = null;
static {
conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "192.168.205.153");
}
/*
* 创建表
* @tableName 表名
* @family 列族列表
*/
public static void creatTable(String tableName, String[] family) throws Exception {
HBaseAdmin admin = new HBaseAdmin(conf);
HTableDescriptor desc = new HTableDescriptor(tableName);
for (int i = 0; i < family.length; i++) {
desc.addFamily(new HColumnDescriptor(family[i]));
}
if (admin.tableExists(tableName)) {
System.out.println("table Exists!");
System.exit(0);
} else {
admin.createTable(desc);
System.out.println("create table Success!");
}
}
/*
* 为表添加数据(适合知道有多少列族的固定表)
* @rowKey rowKey
* @tableName 表名
* @column1 第一个列族列表
* @value1 第一个列的值的列表
* @column2 第二个列族列表
* @value2 第二个列的值的列表
*/
public static void addData(String rowKey, String tableName,
String[] column1, String[] value1, String[] column2, String[] value2)
throws IOException {
Put put = new Put(Bytes.toBytes(rowKey));// 设置rowkey
HTable table = new HTable(conf, tableName);// 获取表
HColumnDescriptor[] columnFamilies = table.getTableDescriptor() // 获取所有的列族
.getColumnFamilies();
for (int i = 0; i < columnFamilies.length; i++) {
String familyName = columnFamilies[i].getNameAsString(); // 获取列族名
if (familyName.equals("article")) { // article列族put数据
for (int j = 0; j < column1.length; j++) {
put.add(Bytes.toBytes(familyName),
Bytes.toBytes(column1[j]), Bytes.toBytes(value1[j]));
}
}
if (familyName.equals("author")) { // author列族put数据
for (int j = 0; j < column2.length; j++) {
put.add(Bytes.toBytes(familyName),
Bytes.toBytes(column2[j]), Bytes.toBytes(value2[j]));
}
}
}
table.put(put);
System.out.println("add data Success!");
}
/*
* 根据rwokey查询
* @rowKey rowKey
* @tableName 表名
*/
public static Result getResult(String tableName, String rowKey) throws IOException {
Get get = new Get(Bytes.toBytes(rowKey));
HTable table = new HTable(conf, tableName);// 获取表
Result result = table.get(get);
for (KeyValue kv : result.list()) {
System.out.println("family:" + Bytes.toString(kv.getFamily()));
System.out
.println("qualifier:" + Bytes.toString(kv.getQualifier()));
System.out.println("value:" + Bytes.toString(kv.getValue()));
System.out.println("Timestamp:" + kv.getTimestamp());
System.out.println("-------------------------------------------");
}
return result;
}
/*
* 遍历查询hbase表
* @tableName 表名
*/
public static void getResultScann(String tableName) throws IOException {
Scan scan = new Scan();
ResultScanner rs = null;
HTable table = new HTable(conf, tableName);
try {
rs = table.getScanner(scan);
for (Result r : rs) {
for (KeyValue kv : r.list()) {
System.out.println("family:"
+ Bytes.toString(kv.getFamily()));
System.out.println("qualifier:"
+ Bytes.toString(kv.getQualifier()));
System.out
.println("value:" + Bytes.toString(kv.getValue()));
System.out.println("timestamp:" + kv.getTimestamp());
System.out
.println("-------------------------------------------");
}
}
} finally {
rs.close();
}
}
/*
* 查询表中的某一列
* @tableName 表名
* @rowKey rowKey
*/
public static void getResultByColumn(String tableName, String rowKey,
String familyName, String columnName) throws IOException {
HTable table = new HTable(conf, tableName);
Get get = new Get(Bytes.toBytes(rowKey));
get.addColumn(Bytes.toBytes(familyName), Bytes.toBytes(columnName)); // 获取指定列族和列修饰符对应的列
Result result = table.get(get);
for (KeyValue kv : result.list()) {
System.out.println("family:" + Bytes.toString(kv.getFamily()));
System.out
.println("qualifier:" + Bytes.toString(kv.getQualifier()));
System.out.println("value:" + Bytes.toString(kv.getValue()));
System.out.println("Timestamp:" + kv.getTimestamp());
System.out.println("-------------------------------------------");
}
}
/*
* 更新表中的某一列
* @tableName 表名
* @rowKey rowKey
* @familyName 列族名
* @columnName 列名
* @value 更新后的值
*/
public static void updateTable(String tableName, String rowKey,
String familyName, String columnName, String value)
throws IOException {
HTable table = new HTable(conf, tableName);
Put put = new Put(Bytes.toBytes(rowKey));
put.add(Bytes.toBytes(familyName), Bytes.toBytes(columnName),
Bytes.toBytes(value));
table.put(put);
System.out.println("update table Success!");
}
/*
* 查询某列数据的多个版本
* @tableName 表名
* @rowKey rowKey
* @familyName 列族名
* @columnName 列名
*/
public static void getResultByVersion(String tableName, String rowKey,
String familyName, String columnName) throws IOException {
HTable table = new HTable(conf, tableName);
Get get = new Get(Bytes.toBytes(rowKey));
get.addColumn(Bytes.toBytes(familyName), Bytes.toBytes(columnName));
get.setMaxVersions(5);
Result result = table.get(get);
for (KeyValue kv : result.list()) {
System.out.println("family:" + Bytes.toString(kv.getFamily()));
System.out
.println("qualifier:" + Bytes.toString(kv.getQualifier()));
System.out.println("value:" + Bytes.toString(kv.getValue()));
System.out.println("Timestamp:" + kv.getTimestamp());
System.out.println("-------------------------------------------");
}
List<?> results = table.get(get).list(); Iterator<?> it =
results.iterator(); while (it.hasNext()) {
System.out.println(it.next().toString()); }
}
/*
* 删除指定的列
* @tableName 表名
* @rowKey rowKey
* @familyName 列族名
* @columnName 列名
*/
public static void deleteColumn(String tableName, String rowKey,
String falilyName, String columnName) throws IOException {
HTable table = new HTable(conf, tableName);
Delete deleteColumn = new Delete(Bytes.toBytes(rowKey));
deleteColumn.deleteColumns(Bytes.toBytes(falilyName),
Bytes.toBytes(columnName));
table.delete(deleteColumn);
System.out.println(falilyName + ":" + columnName + "is deleted!");
}
/*
* 删除指定的列
* @tableName 表名
* @rowKey rowKey
*/
public static void deleteAllColumn(String tableName, String rowKey)
throws IOException {
HTable table = new HTable(conf, tableName);
Delete deleteAll = new Delete(Bytes.toBytes(rowKey));
table.delete(deleteAll);
System.out.println("all columns are deleted!");
}
/*
* 删除表
* @tableName 表名
*/
public static void deleteTable(String tableName) throws IOException {
HBaseAdmin admin = new HBaseAdmin(conf);
admin.disableTable(tableName);
admin.deleteTable(tableName);
System.out.println(tableName + "is deleted!");
}
public static void main(String[] args) throws Exception {
// 创建表
// String tableName = "blog2"; String[] family = { "article","author" };
// creatTable(tableName,family);
// 为表添加数据
// String[] column1 = { "title", "content", "tag" }; String[] value1 = {"Head First HBase",
// "HBase is the Hadoop database. Use it when you need random, realtime read/write access to your Big Data."
// , "Hadoop,HBase,NoSQL" }; String[] column2 = { "name", "nickname" };
// String[] value2 = { "nicholas", "lee" }; addData("rowkey1", "blog2",
// column1, value1, column2, value2);
// 删除一列
// deleteColumn("blog2", "rowkey1", "author", "nickname");
// 删除所有列
// deleteAllColumn("blog2", "rowkey1");
//删除表
// deleteTable("blog2");
// 查询
// getResult("blog2", "rowkey1");
// 查询某一列的值
// getResultByColumn("blog2", "rowkey1", "author", "name");
// updateTable("blog2", "rowkey1", "author", "name","bin");
// getResultByColumn("blog2", "rowkey1", "author", "name");
// 遍历查询
// getResultScann("blog2");
// 查询某列的多版本
getResultByVersion("blog2", "rowkey1", "author", "name");
}
}注意:手动构建 HTable 已被弃用。请使用 连接 来实例化表 。通过连接,可以使用 Connection.getTable(TableName)
2.HBase 2版本:
(1)连接HBase:
我这里使用的是HBase 2.1.2版本。这里我们采用静态方式连接HBase,不同于2.1.2之前的版本,无需创建HBase线程池,HBase2.1.2提供的代码已经封装好,只需创建调用即可:
/**
* 声明静态配置
*/
static Configuration conf = null;
static Connection conn = null;
static {
conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "hadoop01,hadoop02,hadoop03");
conf.set("hbase.zookeeper.property.client", "2181");
try{
conn = ConnectionFactory.createConnection(conf);
}catch (Exception e){
e.printStackTrace();
}
}(2)创建HBase的表:
创建HBase表,是通过Admin来执行的,表和列簇则是分别通过TableDescriptorBuilder和ColumnFamilyDescriptorBuilder来构建:
/**
* 创建只有一个列簇的表
* @throws Exception
*/
public static void createTable() throws Exception{
Admin admin = conn.getAdmin();
if (!admin.tableExists(TableName.valueOf("test"))){
TableName tableName = TableName.valueOf("test");
//表描述器构造器
TableDescriptorBuilder tdb = TableDescriptorBuilder.newBuilder(tableName);
//列族描述器构造器
ColumnFamilyDescriptorBuilder cdb = ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes("user"));
//获得列描述器
ColumnFamilyDescriptor cfd = cdb.build();
//添加列族
tdb.setColumnFamily(cfd);
//获得表描述器
TableDescriptor td = tdb.build();
//创建表
admin.createTable(td);
}else {
System.out.println("表已存在");
}
//关闭连接
conn.close();
}(3)HBase表添加数据:
通过put api来添加数据:
/**
* 添加数据(多个rowKey,多个列族)
* @throws Exception
*/
public static void insertMany() throws Exception{
Table table = conn.getTable(TableName.valueOf("test"));
List<Put> puts = new ArrayList<Put>();
Put put1 = new Put(Bytes.toBytes("rowKey1"));
put1.addColumn(Bytes.toBytes("user"), Bytes.toBytes("name"), Bytes.toBytes("wd"));
Put put2 = new Put(Bytes.toBytes("rowKey2"));
put2.addColumn(Bytes.toBytes("user"), Bytes.toBytes("age"), Bytes.toBytes("25"));
Put put3 = new Put(Bytes.toBytes("rowKey3"));
put3.addColumn(Bytes.toBytes("user"), Bytes.toBytes("weight"), Bytes.toBytes("60kg"));
Put put4 = new Put(Bytes.toBytes("rowKey4"));
put4.addColumn(Bytes.toBytes("user"), Bytes.toBytes("sex"), Bytes.toBytes("男"));
puts.add(put1);
puts.add(put2);
puts.add(put3);
puts.add(put4);
table.put(puts);
table.close();
}(4)删除HBase的列簇或列:
/**
* 根据rowKey删除一行数据、或者删除某一行的某个列簇,或者某一行某个列簇某列
* @param tableName
* @param rowKey
* @throws Exception
*/
public static void deleteData(TableName tableName, String rowKey, String rowKey, String columnFamily, String columnName) throws Exception{
Table table = conn.getTable(tableName);
Delete delete = new Delete(Bytes.toBytes(rowKey));
//①根据rowKey删除一行数据
table.delete(delete);
//②删除某一行的某一个列簇内容
delete.addFamily(Bytes.toBytes(columnFamily));
//③删除某一行某个列簇某列的值
delete.addColumn(Bytes.toBytes(columnFamily), Bytes.toBytes(columnName));
table.close();
}(5)更新HBase表的列:
使用Put api直接替换掉即可:
/**
* 根据RowKey , 列簇, 列名修改值
* @param tableName
* @param rowKey
* @param columnFamily
* @param columnName
* @param columnValue
* @throws Exception
*/
public static void updateData(TableName tableName, String rowKey, String columnFamily, String columnName, String columnValue) throws Exception{
Table table = conn.getTable(tableName);
Put put1 = new Put(Bytes.toBytes(rowKey));
put1.addColumn(Bytes.toBytes(columnFamily), Bytes.toBytes(columnName), Bytes.toBytes(columnValue));
table.put(put1);
table.close();
}(6)HBase查询:
HBase查询分为get、scan、scan和filter结合。filter过滤器又分为RowFilter(rowKey过滤器)、SingleColumnValueFilter(列值过滤器)、ColumnPrefixFilter(列名前缀过滤器)。
/**
* 根据rowKey查询数据
* @param tableName
* @param rowKey
* @throws Exception
*/
public static void getResult(TableName tableName, String rowKey) throws Exception{
Table table = conn.getTable(tableName);
//获得一行
Get get = new Get(Bytes.toBytes(rowKey));
Result set = table.get(get);
Cell[] cells = set.rawCells();
for (Cell cell: cells){
System.out.println(Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength()) + "::" +
Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
table.close();
}
//过滤器 LESS < LESS_OR_EQUAL <= EQUAL = NOT_EQUAL <> GREATER_OR_EQUAL >= GREATER > NO_OP 排除所有
/**
* @param tableName
* @throws Exception
*/
public static void scanTable(TableName tableName) throws Exception{
Table table = conn.getTable(tableName);
//①全表扫描
Scan scan1 = new Scan();
ResultScanner rscan1 = table.getScanner(scan1);
//②rowKey过滤器
Scan scan2 = new Scan();
//str$ 末尾匹配,相当于sql中的 %str ^str开头匹配,相当于sql中的str%
RowFilter filter = new RowFilter(CompareOperator.EQUAL, new RegexStringComparator("Key1$"));
scan2.setFilter(filter);
ResultScanner rscan2 = table.getScanner(scan2);
//③列值过滤器
Scan scan3 = new Scan();
//下列参数分别为列族,列名,比较符号,值
SingleColumnValueFilter filter3 = new SingleColumnValueFilter(Bytes.toBytes("author"), Bytes.toBytes("name"),
CompareOperator.EQUAL, Bytes.toBytes("spark"));
scan3.setFilter(filter3);
ResultScanner rscan3 = table.getScanner(scan3);
//列名前缀过滤器
Scan scan4 = new Scan();
ColumnPrefixFilter filter4 = new ColumnPrefixFilter(Bytes.toBytes("name"));
scan4.setFilter(filter4);
ResultScanner rscan4 = table.getScanner(scan4);
//过滤器集合
Scan scan5 = new Scan();
FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ALL);
SingleColumnValueFilter filter51 = new SingleColumnValueFilter(Bytes.toBytes("author"), Bytes.toBytes("name"),
CompareOperator.EQUAL, Bytes.toBytes("spark"));
ColumnPrefixFilter filter52 = new ColumnPrefixFilter(Bytes.toBytes("name"));
list.addFilter(filter51);
list.addFilter(filter52);
scan5.setFilter(list);
ResultScanner rscan5 = table.getScanner(scan5);
for (Result rs : rscan){
String rowKey = Bytes.toString(rs.getRow());
System.out.println("row key :" + rowKey);
Cell[] cells = rs.rawCells();
for (Cell cell: cells){
System.out.println(Bytes.toString(cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength()) + "::"
+ Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength()) + "::"
+ Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
System.out.println("-------------------------------------------");
}
}(7)快速测试hbase连通性Demo(查询所有表名):
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;
public class HBaseTableLister {
public static void main(String[] args) throws IOException {
// 创建配置对象
Configuration config = HBaseConfiguration.create();
// 设置HBase集群的连接信息
config.set("hbase.zookeeper.quorum", ",,");
config.set("hbase.zookeeper.property.clientPort", "2181");
// 创建HBase连接
Connection connection = ConnectionFactory.createConnection(config);
// 获取HBase管理员对象
Admin admin = connection.getAdmin();
// 查询HBase中的所有表
TableName[] tableNames = admin.listTableNames();
// 输出表名
for (TableName tableName : tableNames) {
System.out.println(Bytes.toString(tableName.getName()));
}
// 关闭连接
admin.close();
connection.close();
}
}三、Scala
1.读写hbase+sparksql查询hbase的表数据:
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.{Put, Result}
import .ImmutableBytesWritable
import org.apache.hadoop.hbase.mapreduce.{TableInputFormat, TableOutputFormat}
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.mapreduce.Job
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.types.{StringType, StructField, StructType}
import org.apache.spark.sql.{DataFrame, Row, SparkSession}
import org.apache.spark.{SparkConf, SparkContext}
/**
* @Auther: huiq
* @Date: 2021/7/23
* @Description: 连接hbase测试
*/
object OperateHbaseTest {
def main(args: Array[String]): Unit = {
//初始化spark
val sparkConf = new SparkConf().setMaster("local[2]").setAppName(this.getClass.getSimpleName)
val spark: SparkSession = SparkSession.builder().config(sparkConf).getOrCreate()
//初始化hbase,指定zookeeper的参数
val config: Configuration = HBaseConfiguration.create()
config.set("hbase.zookeeper.quorum", "node01,node02,node03") // HBase集群服务器地址(任一台)
config.set("hbase.zookeeper.property.clientPort", "2181") // zookeeper客户端访问端口
config.set("zookeeper.znode.parent", "/hbase-unsecure")
val sc: SparkContext = spark.sparkContext
// 设定读取的表名
config.set(TableInputFormat.INPUT_TABLE,"test_schema1:t2")
// 从hbase获取一张表的所有数据,得到一个RDD
val hbaseRDD: RDD[(ImmutableBytesWritable, Result)] = sc.newAPIHadoopRDD(config,classOf[TableInputFormat],classOf[ImmutableBytesWritable],classOf[Result])
val count = hbaseRDD.count()
println("Students RDD Count--->" + count)
// 遍历输出
hbaseRDD.foreach({ case (_,result) =>
val key = Bytes.toString(result.getRow)
val a = Bytes.toString(result.getValue("F".getBytes,"a".getBytes))
val b = Bytes.toString(result.getValue("F".getBytes,"b".getBytes))
println("Row key:"+key+" a:"+oldData+" b:"+newData)
})
// 写hbase
val tablename = "test_schema1:t2"
config.set(TableOutputFormat.OUTPUT_TABLE, "test_schema1:t2")
val job = new Job(config)
job.setOutputKeyClass(classOf[ImmutableBytesWritable])
job.setOutputValueClass(classOf[Result])
job.setOutputFormatClass(classOf[TableOutputFormat[ImmutableBytesWritable]])
val indataRDD = sc.makeRDD(Array("3,26,M","4,27,M")) //构建两行记录
val rdd = indataRDD.map(_.split(',')).map{arr=>{
val put = new Put(Bytes.toBytes(arr(0))) //行健的值
put.addColumn(Bytes.toBytes("F"),Bytes.toBytes("a"),Bytes.toBytes(arr(1)))
put.addColumn(Bytes.toBytes("F"),Bytes.toBytes("b"),Bytes.toBytes(arr(2)))
// put.add(Bytes.toBytes("F"),Bytes.toBytes("a"),Bytes.toBytes(arr(1)) // 网上有这么写的,但是我这里报错,没有深入研究或许是版本的问题吧
(new ImmutableBytesWritable, put)
}}
rdd.saveAsNewAPIHadoopDataset(job.getConfiguration())
// 构建Row类型的RDD
val rowRDD = hbaseRDD.map(p => {
val name = Bytes.toString(p._2.getValue(Bytes.toBytes("F"),Bytes.toBytes("a")))
val age = Bytes.toString(p._2.getValue(Bytes.toBytes("F"),Bytes.toBytes("b")))
Row(name,age)
})
// 构造DataFrame的元数据
val schema = StructType(List(
StructField("a",StringType,true),
StructField("b",StringType,true)
))
// 构造DataFrame
val dataFrame = spark.createDataFrame(rowRDD,schema)
// 注册成为临时表供SQL查询操作
dataFrame.createTempView("t2")
val result: DataFrame = spark.sql("select * from t2")
result.show()
}
}注意:我这里使用的是Ambari2.7.4+HDP3.1.4版本,正常整合之后这三行代码都不用写也可以连接成功hbase),但是一开始我启动程序报错:java.util.concurrent.ExecutionException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/hbaseid

原因:hbase-site.xml文件中的配置为:
<property>
<name>zookeeper.znode.parent</name>
<value>/hbase-unsecure</value>
</property>方法一:改为<value>/hbase</value>,重启hbase。
注意:zookeeper.znode.parent的值为在zookeeper中创建的目录。

方法二:在代码中添加config.set("zookeeper.znode.parent", "/hbase-unsecure")
补充:在hive的ods层中创建完hbase的映射表后,想通过create table as sleect ...语句在dwd层生成相应的表,但是却报错:

解决方法1:启动的时候添加添加相应的参数:beeline -hiveconf zookeeper.znode.parent=/hbase-unsecure或者hive -hiveconf zookeeper.znode.parent=/hbase-unsecure解决方法2:我使用的是ambari的hdp 3.1.4版本,添加如下配置后执行beeline命令即可

2.sparkstreaming读取kafka的数据再写入hbase:
import java.util
import com.rongrong.bigdata.utils.{KafkaZkUtils, UMSUtils}
import kafka.utils.ZkUtils
import org.apache.hadoop.hbase.client.{ConnectionFactory, Put}
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.{HBaseConfiguration, TableName}
import org.apache.kafka.clients.consumer.ConsumerConfig
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.log4j.Logger
import org.apache.spark.streaming.{Durations, StreamingContext}
import scala.util.Try
object StandardOnlie {
private val logger: Logger = Logger.getLogger(this.getClass)
def main(args: Array[String]): Unit = {
val spark = InitializeSpark.createSparkSession("StandardOnlie", "local")
val streamingContext = new StreamingContext(spark.sparkContext, Durations.seconds(30))
val kafkaParams = Map[String, Object](
ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> "node01:6667,node02:6667,node03:6667",
ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> classOf[StringDeserializer],
ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> classOf[StringDeserializer],
ConsumerConfig.GROUP_ID_CONFIG -> "group-02",
ConsumerConfig.AUTO_OFFSET_RESET_CONFIG -> "latest",
ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG -> (false: java.lang.Boolean)
)
val topic: String = "djt_db.test_schema1.result"
val zkUrl = "node01:2181,node02:2181,node03:2181"
val sessionTimeout = 1000
val connectionTimeout = 1000
val zkClient = ZkUtils.createZkClient(zkUrl, sessionTimeout, connectionTimeout)
val kafkaStream = KafkaZkUtils.createDirectStream(zkClient, streamingContext, kafkaParams, topic)
// 开始处理批次消息
kafkaStream.foreachRDD(rdd => {
// 处理从获取 kafka 中的数据
("=============== Total " + rdd.count() + " events in this batch ..")
rdd.foreach(x => {
val configuration = HBaseConfiguration.create()
configuration.set("zookeeper.znode.parent", "/hbase-unsecure")
val connection = ConnectionFactory.createConnection(configuration)
// 获取kafka中真正的数据
var usmString = x.value()
val flag: Boolean = UMSUtils.isHeartbeatUms(usmString)
if (!flag) { // 过滤掉心跳数据
val usmActiontype = UMSUtils.getActionType(usmString)
println(s"该条数据的类型为--->${usmActiontype}")
println("读取kafka数据,解析正文数据:" + x.value())
val data: util.Map[String, String] = UMSUtils.getDataFromUms(usmString)
//获取表连接
val table = connection.getTable(TableName.valueOf("test_schema1:t2"))
val rowkey: String = 123456 + "_" + data.get("a")
val put = new Put(Bytes.toBytes(rowkey))
put.addColumn(Bytes.toBytes("F"), Bytes.toBytes("ums_active_"), Bytes.toBytes(data.get("ums_active_")))
put.addColumn(Bytes.toBytes("F"), Bytes.toBytes("ums_id_"), Bytes.toBytes(data.get("ums_id_")))
put.addColumn(Bytes.toBytes("F"), Bytes.toBytes("ums_ts_"), Bytes.toBytes(data.get("ums_ts_")))
put.addColumn(Bytes.toBytes("F"), Bytes.toBytes("a"), Bytes.toBytes(data.get("a")))
put.addColumn(Bytes.toBytes("F"), Bytes.toBytes("b"), Bytes.toBytes(data.get("b")))
table.put(put)
table.close()
//将数据写入HBase,若出错关闭table
Try(table.put(put)).getOrElse(table.close())
//分区数据写入HBase后关闭连接
table.close()
println(s"解析到的数据为--->${data}")
}
})
// 更新offset到zookeeper中
KafkaZkUtils.saveOffsets(zkClient, topic, KafkaZkUtils.getZkPath(kafkaParams, topic), rdd)
})
})
streamingContext.start()
streamingContext.awaitTermination()
streamingContext.stop()
}
}import kafka.utils.{ZKGroupTopicDirs, ZkUtils}
import org.I0Itec.zkclient.ZkClient
import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord}
import org.apache.kafka.common.TopicPartition
import org.apache.log4j.Logger
import org.apache.spark.rdd.RDD
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.dstream.InputDStream
import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent
import org.apache.spark.streaming.kafka010.{ConsumerStrategies, HasOffsetRanges, KafkaUtils}
object KafkaZkUtils {
private val logger: Logger = Logger.getLogger(this.getClass)
/**
* 获取 consumer 在zk上的路径
* @param kafkaParams
* @param topic
* @return
*/
def getZkPath(kafkaParams: Map[String, Object], topic: String): String ={
val topicDirs = new ZKGroupTopicDirs(kafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG).toString, topic)
s"${topicDirs.consumerOffsetDir}"
}
/**
* 创建 DirectStream
* @param zkClient
* @param streamingContext
* @param kafkaParams
* @param topic
* @return
*/
def createDirectStream(zkClient: ZkClient,streamingContext: StreamingContext, kafkaParams: Map[String, Object], topic: String): InputDStream[ConsumerRecord[String, String]] = {
val zkPath = getZkPath(kafkaParams,topic)
//读取 topic 的 offset
val storedOffsets = readOffsets(zkClient, topic, zkPath)
val kafkaStream: InputDStream[ConsumerRecord[String, String]] = storedOffsets match {
//上次未保存offsets
case None =>
KafkaUtils.createDirectStream[String, String](
streamingContext,
PreferConsistent,
ConsumerStrategies.Subscribe[String, String](Array(topic), kafkaParams)
)
case Some(fromOffsets) => {
KafkaUtils.createDirectStream[String, String](
streamingContext,
PreferConsistent,
// 指定分区消费,无法动态感知分区变化
// ConsumerStrategies.Assign[String, String](fromOffsets.keys.toList, kafkaParams, fromOffsets)
ConsumerStrategies.Subscribe[String, String](List(topic), kafkaParams, fromOffsets)
)
}
}
kafkaStream
}
/**
* 保存 offset
* @param zkClient
* @param topic
* @param zkPath
* @param rdd
*/
def saveOffsets(zkClient: ZkClient,topic: String, zkPath: String, rdd: RDD[_]): Unit = {
("Saving offsets to zookeeper")
val offsetsRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges
offsetsRanges.foreach(offsetRange => logger.debug(s"Using ${offsetRange}"))
val offsetsRangesStr = offsetsRanges.map(offsetRange => s"${offsetRange.partition}:${offsetRange.untilOffset}").mkString(",")
(s"Writing offsets to Zookeeper: ${offsetsRangesStr}")
ZkUtils(zkClient, false).updatePersistentPath(zkPath, offsetsRangesStr)
}
/**
* 读取 offset
* @param zkClient
* @param topic
* @param zkPath
* @return
*/
def readOffsets(zkClient: ZkClient, topic: String, zkPath: String): Option[Map[TopicPartition, Long]] = {
("Reading offsets from zookeeper")
val (offsetsRangesStrOpt, _) = ZkUtils(zkClient, false).readDataMaybeNull(zkPath)
offsetsRangesStrOpt match {
case Some(offsetsRangesStr) => {
logger.debug(s"Read offset ranges: ${
offsetsRangesStr
}")
val offsets: Map[TopicPartition, Long] = offsetsRangesStr.split(",").map(s => s.split(":"))
.map({
case Array(partitionStr, offsetStr) =>
(new TopicPartition(topic, partitionStr.toInt) -> offsetStr.toLong)
// 这里可以指定offset的位置读取,注意:还需要把上面createDirectStream方法的ConsumerStrategies.Assign代码打开
// (new TopicPartition(topic, partitionStr.toInt) -> "20229".toLong)
}).toMap
Some(offsets)
}
case None =>
("No offsets found in Zookeeper")
None
}
}
}本来想用foreachPartition的,但是我没成功。他这个里面在foreachPartition里面创建连接可以,并且他提了一句“获取HBase连接,分区创建一个连接,分区不跨节点,不需要序列化”,但是在我这里这样写就报错了:

报错代码:
rdd.foreachPartition(partitionRecords => {
val configuration = HBaseConfiguration.create()
configuration.set("zookeeper.znode.parent", "/hbase-unsecure")
val connection = ConnectionFactory.createConnection(configuration) //获取HBase连接,分区创建一个连接,分区不跨节点,不需要序列化
partitionRecords.foreach(x => {
// 获取kafka中真正的数据
var usmString = x.value()查了一些资料:关于scala:通过Spark写入HBase:任务不可序列化、zookeeper报错: org.I0Itec.zkclient.exception.ZkMarshallingError: java.io.EOFException、Spark 序列化问题全解、HBase连接池,目前也还没有找到解决方法,有知道的人可以探讨一下。
















