hbase java 乱码查询 java读hbase

转载

代码魔术师之手 2023-09-20 03:46:23

文章标签 hbase java 乱码查询 hbase java hadoop apache 文章分类 Hbase 数据库

文章目录

一、HBase连接的方式概况
二、Java

1.HBase老版本：

（1）建表：
（2）删除表：
（3）写入数据：
（4）查询：
（5）通过Java Api与HBase交互的一些常用的操作整合：

2.HBase 2版本：

（1）连接HBase：
（2）创建HBase的表：
（3）HBase表添加数据：
（4）删除HBase的列簇或列：
（5）更新HBase表的列：
（6）HBase查询：
（7）快速测试hbase连通性Demo（查询所有表名）：

三、Scala

1.读写hbase+sparksql查询hbase的表数据：
2.sparkstreaming读取kafka的数据再写入hbase：

一、HBase连接的方式概况

主要分为：

纯Java API读写HBase的方式；
Spark读写HBase的方式；
Flink读写HBase的方式；
HBase通过Phoenix读写的方式；

第一种方式是HBase自身提供的比较原始的高效操作方式，而第二、第三则分别是Spark、Flink集成HBase的方式，最后一种是第三方插件Phoenix集成的JDBC方式，Phoenix集成的JDBC操作方式也能在Spark、Flink中调用。

二、Java

1.HBase老版本：

（1）建表：

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.HBaseAdmin;

public class CreateTableTest {
    public static void main(String[] args) throws IOException  {
        //设置HBase数据库的连接配置参数
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum",  "192.168.8.71");  //  Zookeeper的地址
        conf.set("hbase.zookeeper.property.clientPort", "2181");
        String tableName = "emp";
        String[] family = { "basicinfo","deptinfo"};
        HBaseAdmin hbaseAdmin = new HBaseAdmin(conf);
        //创建表对象
        HTableDescriptor hbaseTableDesc = new HTableDescriptor(TableName.valueOf(tableName));
        for(int i = 0; i < family.length; i++) {
            //设置表字段
            hbaseTableDesc.addFamily(new HColumnDescriptor(family[i]));
        }
        //判断表是否存在，不存在则创建，存在则打印提示信息
        if(hbaseAdmin.tableExists(TableName.valueOf(tableName))) {
            System.out.println("TableExists!");
            /**
             这个方法是用来结束当前正在运行中的java虚拟机。如何status是非零参数，那么表示是非正常退出。
             System.exit(0)是将你的整个虚拟机里的内容都停掉了 ，而dispose()只是关闭这个窗口，但是并没有停止整个application exit() 。无论如何，内存都释放了！也就是说连JVM都关闭了，内存里根本不可能还有什么东西
             System.exit(0)是正常退出程序，而System.exit(1)或者说非0表示非正常退出程序
             System.exit(status)不管status为何值都会退出程序。和return 相比有以下不同点：   return是回到上一层，而System.exit(status)是回到最上层
             */
            System.exit(0);
        } else{
            hbaseAdmin.createTable(hbaseTableDesc);
            System.out.println("Create table Success!");
        }
    }
}

（2）删除表：

import java.io.IOException;
 
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.HBaseAdmin;
public class DeleteMyTable {
 
	public static void main(String[] args) throws IOException {
		String tableName = "mytb";
		delete(tableName);
	}
	
	public static Configuration getConfiguration() {
		Configuration conf = HBaseConfiguration.create();
		conf.set("hbase.rootdir", "hdfs://192.168.8.71:9000/hbase");
		conf.set("hbase.zookeeper.quorum", "192.168.8.71");
		return conf;
	}
	
	public static void delete(String tableName) throws IOException {
		HBaseAdmin hAdmin = new HBaseAdmin(getConfiguration());
		if(hAdmin.tableExists(tableName)){
			try {
				hAdmin.disableTable(tableName);
				hAdmin.deleteTable(tableName);
				System.err.println("Delete table Success");
			} catch (IOException e) {
			System.err.println("Delete table Failed ");
		}
		}else{
		System.err.println("table not exists");
		}
	}
}

（3）写入数据：

某电商网站，后台有买家信息表buyer，每注册一名新用户网站后台会产生一条日志，并写入hbase中。

数据格式为：用户ID（buyer_id），注册日期（reg_date），注册IP（reg_ip），卖家状态（buyer_status，0表示冻结，1表示正常），以“\t”分割，数据内容如下：

用户ID   注册日期  注册IP   卖家状态
20385,2010-05-04,124.64.242.30,1
20386,2010-05-05,117.136.0.172,1
20387,2010-05-06 ,114.94.44.230,1

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.MasterNotRunningException;
import org.apache.hadoop.hbase.ZooKeeperConnectionException;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.util.Bytes;

public class PutData {
    public static void main(String[] args) throws MasterNotRunningException,
            ZooKeeperConnectionException, IOException {
        String tableName = "mytb";
        String columnFamily = "mycf";

        put(tableName, "20385", columnFamily, "2010-05-04:reg_ip", "124.64.242.30");
        put(tableName, "20385", columnFamily, "2010-05-04:buyer_status", "1");

        put(tableName, "20386", columnFamily, "2010-05-05:reg_ip", "117.136.0.172");
        put(tableName, "20386", columnFamily, "2010-05-05:buyer_status", "1");

        put(tableName, "20387", columnFamily, "2010-05-06:reg_ip", "114.94.44.230");
        put(tableName, "20387", columnFamily, "2010-05-06:buyer_status", "1");

    }

    public static Configuration getConfiguration() {
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.rootdir", "hdfs://192.168.8.71:9000/hbase");
        conf.set("hbase.zookeeper.quorum", "192.168.8.71");
        return conf;
    }

    public static void put(String tableName, String row, String columnFamily,
                           String column, String data) throws IOException {
        HTable table = new HTable(getConfiguration(), tableName);
        Put put = new Put(Bytes.toBytes(row));
        put.add(Bytes.toBytes(columnFamily),
                Bytes.toBytes(column),
                Bytes.toBytes(data));
        table.put(put);
        System.err.println("SUCCESS");
    }
}

注意：手动构建 HTable 已被弃用。请使用连接来实例化表。通过连接，可以使用 Connection.getTable(TableName)

（4）查询：

import java.io.IOException;
 
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;
 
public class GetData {
	public static void main(String[] args) throws IOException {
		String tableName = "mytb";
		get(tableName, "20386");
 
	}
 
	public static Configuration getConfiguration() {
		Configuration conf = HBaseConfiguration.create();
		conf.set("hbase.rootdir", "hdfs://192.168.8.71:9000/hbase");
		conf.set("hbase.zookeeper.quorum", "192.168.8.71");
		return conf;
	}
 
	public static void get(String tableName, String rowkey) throws IOException {
		HTable table = new HTable(getConfiguration(), tableName);
		Get get = new Get(Bytes.toBytes(rowkey));
		Result result = table.get(get);
		byte[] value1 = result.getValue("mycf".getBytes(), "2010-05-05:reg_ip".getBytes());
		byte[] value2 = result.getValue("mycf".getBytes(), "2010-05-05:buyer_status".getBytes());
		System.err.println("line1:SUCCESS");
		System.err.println("line2:" 
		        + new String(value1) + "\t"
				+ new String(value2));
	}
}

前面的这些代码都这样执行：

[hadoop@h71 q1]$ /usr/jdk1.7.0_25/bin/javac GetData.java
[hadoop@h71 q1]$ /usr/jdk1.7.0_25/bin/java GetData

（5）通过Java Api与HBase交互的一些常用的操作整合：

import java.io.IOException;
import java.util.Iterator;
import java.util.List;
 
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Delete;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;
 
public class HBaseTest2 {
 
    // 声明静态配置
    static Configuration conf = null;
    static {
        conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum", "192.168.205.153");
    }
 
    /*
     * 创建表
     * @tableName 表名
     * @family 列族列表
     */
    public static void creatTable(String tableName, String[] family) throws Exception {
        HBaseAdmin admin = new HBaseAdmin(conf);
        HTableDescriptor desc = new HTableDescriptor(tableName);
        for (int i = 0; i < family.length; i++) {
            desc.addFamily(new HColumnDescriptor(family[i]));
        }
        if (admin.tableExists(tableName)) {
            System.out.println("table Exists!");
            System.exit(0);
        } else {
            admin.createTable(desc);
            System.out.println("create table Success!");
        }
    }
 
    /*
     * 为表添加数据（适合知道有多少列族的固定表）
     * @rowKey rowKey
     * @tableName 表名
     * @column1 第一个列族列表
     * @value1 第一个列的值的列表
     * @column2 第二个列族列表
     * @value2 第二个列的值的列表
     */
    public static void addData(String rowKey, String tableName,
            String[] column1, String[] value1, String[] column2, String[] value2)
            throws IOException {
        Put put = new Put(Bytes.toBytes(rowKey));// 设置rowkey
        HTable table = new HTable(conf, tableName);// 获取表
        HColumnDescriptor[] columnFamilies = table.getTableDescriptor() // 获取所有的列族
                .getColumnFamilies();
 
        for (int i = 0; i < columnFamilies.length; i++) {
            String familyName = columnFamilies[i].getNameAsString(); // 获取列族名
            if (familyName.equals("article")) { // article列族put数据
                for (int j = 0; j < column1.length; j++) {
                    put.add(Bytes.toBytes(familyName),
                            Bytes.toBytes(column1[j]), Bytes.toBytes(value1[j]));
                }
            }
            if (familyName.equals("author")) { // author列族put数据
                for (int j = 0; j < column2.length; j++) {
                    put.add(Bytes.toBytes(familyName),
                            Bytes.toBytes(column2[j]), Bytes.toBytes(value2[j]));
                }
            }
        }
        table.put(put);
        System.out.println("add data Success!");
    }
 
    /*
     * 根据rwokey查询
     * @rowKey rowKey
     * @tableName 表名
     */
    public static Result getResult(String tableName, String rowKey) throws IOException {
        Get get = new Get(Bytes.toBytes(rowKey));
        HTable table = new HTable(conf, tableName);// 获取表
        Result result = table.get(get);
        for (KeyValue kv : result.list()) {
            System.out.println("family:" + Bytes.toString(kv.getFamily()));
            System.out
                    .println("qualifier:" + Bytes.toString(kv.getQualifier()));
            System.out.println("value:" + Bytes.toString(kv.getValue()));
            System.out.println("Timestamp:" + kv.getTimestamp());
            System.out.println("-------------------------------------------");
        }
        return result;
    }
 
    /*
     * 遍历查询hbase表
     * @tableName 表名
     */
    public static void getResultScann(String tableName) throws IOException {
        Scan scan = new Scan();
        ResultScanner rs = null;
        HTable table = new HTable(conf, tableName);
        try {
            rs = table.getScanner(scan);
            for (Result r : rs) {
                for (KeyValue kv : r.list()) {
                    System.out.println("family:"
                            + Bytes.toString(kv.getFamily()));
                    System.out.println("qualifier:"
                            + Bytes.toString(kv.getQualifier()));
                    System.out
                            .println("value:" + Bytes.toString(kv.getValue()));
                    System.out.println("timestamp:" + kv.getTimestamp());
                    System.out
                            .println("-------------------------------------------");
                }
            }
        } finally {
            rs.close();
        }
    }
 
    /*
     * 查询表中的某一列
     * @tableName 表名
     * @rowKey rowKey
     */
    public static void getResultByColumn(String tableName, String rowKey,
            String familyName, String columnName) throws IOException {
    	HTable table = new HTable(conf, tableName);
        Get get = new Get(Bytes.toBytes(rowKey));
        get.addColumn(Bytes.toBytes(familyName), Bytes.toBytes(columnName)); // 获取指定列族和列修饰符对应的列
        Result result = table.get(get);
        for (KeyValue kv : result.list()) {
            System.out.println("family:" + Bytes.toString(kv.getFamily()));
            System.out
                    .println("qualifier:" + Bytes.toString(kv.getQualifier()));
            System.out.println("value:" + Bytes.toString(kv.getValue()));
            System.out.println("Timestamp:" + kv.getTimestamp());
            System.out.println("-------------------------------------------");
        }
    }
 
    /*
     * 更新表中的某一列
     * @tableName 表名
     * @rowKey rowKey
     * @familyName 列族名
     * @columnName 列名
     * @value 更新后的值
     */
    public static void updateTable(String tableName, String rowKey,
            String familyName, String columnName, String value)
            throws IOException {
    	HTable table = new HTable(conf, tableName);
        Put put = new Put(Bytes.toBytes(rowKey));
        put.add(Bytes.toBytes(familyName), Bytes.toBytes(columnName),
                Bytes.toBytes(value));
        table.put(put);
        System.out.println("update table Success!");
    }
 
    /*
     * 查询某列数据的多个版本
     * @tableName 表名
     * @rowKey rowKey
     * @familyName 列族名
     * @columnName 列名
     */
    public static void getResultByVersion(String tableName, String rowKey,
            String familyName, String columnName) throws IOException {
    	HTable table = new HTable(conf, tableName);
        Get get = new Get(Bytes.toBytes(rowKey));
        get.addColumn(Bytes.toBytes(familyName), Bytes.toBytes(columnName));
        get.setMaxVersions(5);
        Result result = table.get(get);
        for (KeyValue kv : result.list()) {
            System.out.println("family:" + Bytes.toString(kv.getFamily()));
            System.out
                    .println("qualifier:" + Bytes.toString(kv.getQualifier()));
            System.out.println("value:" + Bytes.toString(kv.getValue()));
            System.out.println("Timestamp:" + kv.getTimestamp());
            System.out.println("-------------------------------------------");
        }
        List<?> results = table.get(get).list(); Iterator<?> it =
        results.iterator(); while (it.hasNext()) {
        System.out.println(it.next().toString()); }
    }
 
    /*
     * 删除指定的列
     * @tableName 表名
     * @rowKey rowKey
     * @familyName 列族名
     * @columnName 列名
     */
    public static void deleteColumn(String tableName, String rowKey,
            String falilyName, String columnName) throws IOException {
    	HTable table = new HTable(conf, tableName);
        Delete deleteColumn = new Delete(Bytes.toBytes(rowKey));
        deleteColumn.deleteColumns(Bytes.toBytes(falilyName),
                Bytes.toBytes(columnName));
        table.delete(deleteColumn);
        System.out.println(falilyName + ":" + columnName + "is deleted!");
    }
 
    /*
     * 删除指定的列
     * @tableName 表名
     * @rowKey rowKey
     */
    public static void deleteAllColumn(String tableName, String rowKey)
            throws IOException {
    	HTable table = new HTable(conf, tableName);
        Delete deleteAll = new Delete(Bytes.toBytes(rowKey));
        table.delete(deleteAll);
        System.out.println("all columns are deleted!");
    }
 
    /*
     * 删除表
     * @tableName 表名
     */
    public static void deleteTable(String tableName) throws IOException {
        HBaseAdmin admin = new HBaseAdmin(conf);
        admin.disableTable(tableName);
        admin.deleteTable(tableName);
        System.out.println(tableName + "is deleted!");
    }
 
    public static void main(String[] args) throws Exception {
 
        // 创建表
//        String tableName = "blog2"; String[] family = { "article","author" };
//        creatTable(tableName,family);
 
        // 为表添加数据
//        String[] column1 = { "title", "content", "tag" }; String[] value1 = {"Head First HBase",
//        "HBase is the Hadoop database. Use it when you need random, realtime read/write access to your Big Data."
//        , "Hadoop,HBase,NoSQL" }; String[] column2 = { "name", "nickname" };
//        String[] value2 = { "nicholas", "lee" }; addData("rowkey1", "blog2",
//        column1, value1, column2, value2);
 
        // 删除一列
//         deleteColumn("blog2", "rowkey1", "author", "nickname");
 
        // 删除所有列
//        deleteAllColumn("blog2", "rowkey1");
        
        //删除表
//        deleteTable("blog2");
 
        // 查询
//         getResult("blog2", "rowkey1");
 
        // 查询某一列的值
//         getResultByColumn("blog2", "rowkey1", "author", "name");
//         updateTable("blog2", "rowkey1", "author", "name","bin");
//         getResultByColumn("blog2", "rowkey1", "author", "name");
 
        // 遍历查询
//         getResultScann("blog2");
 
        // 查询某列的多版本
         getResultByVersion("blog2", "rowkey1", "author", "name");
    }
}

注意：手动构建 HTable 已被弃用。请使用连接来实例化表。通过连接，可以使用 Connection.getTable(TableName)

2.HBase 2版本：

（1）连接HBase：

我这里使用的是HBase 2.1.2版本。这里我们采用静态方式连接HBase，不同于2.1.2之前的版本，无需创建HBase线程池，HBase2.1.2提供的代码已经封装好，只需创建调用即可：

/**
  * 声明静态配置
  */
static Configuration conf = null;
static Connection conn = null;
static {
       conf = HBaseConfiguration.create();
       conf.set("hbase.zookeeper.quorum", "hadoop01,hadoop02,hadoop03");
       conf.set("hbase.zookeeper.property.client", "2181");
       try{
           conn = ConnectionFactory.createConnection(conf);
       }catch (Exception e){
           e.printStackTrace();
       }
}

（2）创建HBase的表：

创建HBase表，是通过Admin来执行的，表和列簇则是分别通过TableDescriptorBuilder和ColumnFamilyDescriptorBuilder来构建：

/**
 * 创建只有一个列簇的表
 * @throws Exception
 */
public static void createTable() throws Exception{
    Admin admin = conn.getAdmin();
    if (!admin.tableExists(TableName.valueOf("test"))){
        TableName tableName = TableName.valueOf("test");
        //表描述器构造器
        TableDescriptorBuilder tdb = TableDescriptorBuilder.newBuilder(tableName);
        //列族描述器构造器
        ColumnFamilyDescriptorBuilder cdb = ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes("user"));
        //获得列描述器
        ColumnFamilyDescriptor cfd = cdb.build();
        //添加列族
        tdb.setColumnFamily(cfd);
        //获得表描述器
        TableDescriptor td = tdb.build();
        //创建表
        admin.createTable(td);
    }else {
        System.out.println("表已存在");
    }
    //关闭连接
    conn.close();
}

（3）HBase表添加数据：

通过put api来添加数据：

/**
 * 添加数据（多个rowKey，多个列族）
 * @throws Exception
 */
public static void insertMany() throws Exception{
    Table table = conn.getTable(TableName.valueOf("test"));
    List<Put> puts = new ArrayList<Put>();
    Put put1 = new Put(Bytes.toBytes("rowKey1"));
    put1.addColumn(Bytes.toBytes("user"), Bytes.toBytes("name"), Bytes.toBytes("wd"));

    Put put2 = new Put(Bytes.toBytes("rowKey2"));
    put2.addColumn(Bytes.toBytes("user"), Bytes.toBytes("age"), Bytes.toBytes("25"));

    Put put3 = new Put(Bytes.toBytes("rowKey3"));
    put3.addColumn(Bytes.toBytes("user"), Bytes.toBytes("weight"), Bytes.toBytes("60kg"));

    Put put4 = new Put(Bytes.toBytes("rowKey4"));
    put4.addColumn(Bytes.toBytes("user"), Bytes.toBytes("sex"), Bytes.toBytes("男"));

    puts.add(put1);
    puts.add(put2);
    puts.add(put3);
    puts.add(put4);
    table.put(puts);
    table.close();
}

（4）删除HBase的列簇或列：

/**
 * 根据rowKey删除一行数据、或者删除某一行的某个列簇，或者某一行某个列簇某列
 * @param tableName
 * @param rowKey
 * @throws Exception
 */
public static void deleteData(TableName tableName, String rowKey, String rowKey, String columnFamily, String columnName) throws Exception{
    Table table = conn.getTable(tableName);
    Delete delete = new Delete(Bytes.toBytes(rowKey));
    //①根据rowKey删除一行数据
    table.delete(delete);
    
    //②删除某一行的某一个列簇内容
    delete.addFamily(Bytes.toBytes(columnFamily));
    
    //③删除某一行某个列簇某列的值
    delete.addColumn(Bytes.toBytes(columnFamily), Bytes.toBytes(columnName));
    table.close();
}

（5）更新HBase表的列：

使用Put api直接替换掉即可：

/**
 * 根据RowKey , 列簇， 列名修改值
 * @param tableName
 * @param rowKey
 * @param columnFamily
 * @param columnName
 * @param columnValue
 * @throws Exception
 */
public static void updateData(TableName tableName, String rowKey, String columnFamily, String columnName, String columnValue) throws Exception{
    Table table = conn.getTable(tableName);
    Put put1 = new Put(Bytes.toBytes(rowKey));
    put1.addColumn(Bytes.toBytes(columnFamily), Bytes.toBytes(columnName), Bytes.toBytes(columnValue));
    table.put(put1);
    table.close();
}

（6）HBase查询：

HBase查询分为get、scan、scan和filter结合。filter过滤器又分为RowFilter（rowKey过滤器）、SingleColumnValueFilter（列值过滤器）、ColumnPrefixFilter（列名前缀过滤器）。

/**
 * 根据rowKey查询数据
 * @param tableName
 * @param rowKey
 * @throws Exception
 */
public static void getResult(TableName tableName, String rowKey) throws Exception{
    Table table = conn.getTable(tableName);
    //获得一行
    Get get = new Get(Bytes.toBytes(rowKey));
    Result set = table.get(get);
    Cell[] cells = set.rawCells();
    for (Cell cell: cells){
        System.out.println(Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength()) + "::" +
        Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
    }
    table.close();
}

//过滤器 LESS <  LESS_OR_EQUAL <=   EQUAL =   NOT_EQUAL <>   GREATER_OR_EQUAL >=   GREATER >   NO_OP 排除所有

/**
 * @param tableName
 * @throws Exception
 */
public static void scanTable(TableName tableName) throws Exception{
    Table table = conn.getTable(tableName);
    
    //①全表扫描
    Scan scan1 = new Scan();
    ResultScanner rscan1 = table.getScanner(scan1);
    
    //②rowKey过滤器
    Scan scan2 = new Scan();
    //str$ 末尾匹配，相当于sql中的 %str  ^str开头匹配，相当于sql中的str%
    RowFilter filter = new RowFilter(CompareOperator.EQUAL, new RegexStringComparator("Key1$"));
    scan2.setFilter(filter);
    ResultScanner rscan2 = table.getScanner(scan2);
    
    //③列值过滤器
    Scan scan3 = new Scan();
    //下列参数分别为列族，列名，比较符号，值
    SingleColumnValueFilter filter3 = new SingleColumnValueFilter(Bytes.toBytes("author"), Bytes.toBytes("name"),
               CompareOperator.EQUAL, Bytes.toBytes("spark"));
    scan3.setFilter(filter3);
    ResultScanner rscan3 = table.getScanner(scan3);
    
    //列名前缀过滤器
    Scan scan4 = new Scan();
    ColumnPrefixFilter filter4 = new ColumnPrefixFilter(Bytes.toBytes("name"));
    scan4.setFilter(filter4);
    ResultScanner rscan4 = table.getScanner(scan4);
    
    //过滤器集合
    Scan scan5 = new Scan();
    FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ALL);
    SingleColumnValueFilter filter51 = new SingleColumnValueFilter(Bytes.toBytes("author"), Bytes.toBytes("name"),
              CompareOperator.EQUAL, Bytes.toBytes("spark"));
    ColumnPrefixFilter filter52 = new ColumnPrefixFilter(Bytes.toBytes("name"));
    list.addFilter(filter51);
    list.addFilter(filter52);
    scan5.setFilter(list);
    ResultScanner rscan5 = table.getScanner(scan5);
    
    for (Result rs : rscan){
        String rowKey = Bytes.toString(rs.getRow());
        System.out.println("row key :" + rowKey);
        Cell[] cells = rs.rawCells();
        for (Cell cell: cells){
            System.out.println(Bytes.toString(cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength()) + "::"
                    + Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength()) + "::"
                    + Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
        }
        System.out.println("-------------------------------------------");
    }
}

（7）快速测试hbase连通性Demo（查询所有表名）：

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;

public class HBaseTableLister {

    public static void main(String[] args) throws IOException {
        // 创建配置对象
        Configuration config = HBaseConfiguration.create();

        // 设置HBase集群的连接信息
        config.set("hbase.zookeeper.quorum", ",,");
        config.set("hbase.zookeeper.property.clientPort", "2181");

        // 创建HBase连接
        Connection connection = ConnectionFactory.createConnection(config);

        // 获取HBase管理员对象
        Admin admin = connection.getAdmin();

        // 查询HBase中的所有表
        TableName[] tableNames = admin.listTableNames();

        // 输出表名
        for (TableName tableName : tableNames) {
            System.out.println(Bytes.toString(tableName.getName()));
        }

        // 关闭连接
        admin.close();
        connection.close();
    }
}

三、Scala

1.读写hbase+sparksql查询hbase的表数据：

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.{Put, Result}
import .ImmutableBytesWritable
import org.apache.hadoop.hbase.mapreduce.{TableInputFormat, TableOutputFormat}
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.mapreduce.Job
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.types.{StringType, StructField, StructType}
import org.apache.spark.sql.{DataFrame, Row, SparkSession}
import org.apache.spark.{SparkConf, SparkContext}

/**
 * @Auther: huiq
 * @Date: 2021/7/23
 * @Description: 连接hbase测试
 */
object OperateHbaseTest {

  def main(args: Array[String]): Unit = {

    //初始化spark
    val sparkConf = new SparkConf().setMaster("local[2]").setAppName(this.getClass.getSimpleName)
    val spark: SparkSession = SparkSession.builder().config(sparkConf).getOrCreate()

    //初始化hbase，指定zookeeper的参数
    val config: Configuration = HBaseConfiguration.create()
    config.set("hbase.zookeeper.quorum", "node01,node02,node03") // HBase集群服务器地址（任一台）
    config.set("hbase.zookeeper.property.clientPort", "2181") // zookeeper客户端访问端口
    config.set("zookeeper.znode.parent", "/hbase-unsecure")

    val sc: SparkContext = spark.sparkContext


    // 设定读取的表名
    config.set(TableInputFormat.INPUT_TABLE,"test_schema1:t2")

    // 从hbase获取一张表的所有数据，得到一个RDD
    val hbaseRDD: RDD[(ImmutableBytesWritable, Result)] = sc.newAPIHadoopRDD(config,classOf[TableInputFormat],classOf[ImmutableBytesWritable],classOf[Result])

    val count = hbaseRDD.count()
    println("Students RDD Count--->" + count)

    // 遍历输出
    hbaseRDD.foreach({ case (_,result) =>
      val key = Bytes.toString(result.getRow)
      val a = Bytes.toString(result.getValue("F".getBytes,"a".getBytes))
      val b = Bytes.toString(result.getValue("F".getBytes,"b".getBytes))
      println("Row key:"+key+" a:"+oldData+" b:"+newData)
    })


    // 写hbase
    val tablename = "test_schema1:t2"
    config.set(TableOutputFormat.OUTPUT_TABLE, "test_schema1:t2")

    val job = new Job(config)
    job.setOutputKeyClass(classOf[ImmutableBytesWritable])
    job.setOutputValueClass(classOf[Result])
    job.setOutputFormatClass(classOf[TableOutputFormat[ImmutableBytesWritable]])

    val indataRDD = sc.makeRDD(Array("3,26,M","4,27,M")) //构建两行记录
    val rdd = indataRDD.map(_.split(',')).map{arr=>{
      val put = new Put(Bytes.toBytes(arr(0))) //行健的值
      put.addColumn(Bytes.toBytes("F"),Bytes.toBytes("a"),Bytes.toBytes(arr(1)))
      put.addColumn(Bytes.toBytes("F"),Bytes.toBytes("b"),Bytes.toBytes(arr(2)))
//      put.add(Bytes.toBytes("F"),Bytes.toBytes("a"),Bytes.toBytes(arr(1)) // 网上有这么写的，但是我这里报错，没有深入研究或许是版本的问题吧
      (new ImmutableBytesWritable, put)
    }}
    rdd.saveAsNewAPIHadoopDataset(job.getConfiguration())


    // 构建Row类型的RDD
    val rowRDD = hbaseRDD.map(p => {
      val name = Bytes.toString(p._2.getValue(Bytes.toBytes("F"),Bytes.toBytes("a")))
      val age = Bytes.toString(p._2.getValue(Bytes.toBytes("F"),Bytes.toBytes("b")))
      Row(name,age)
    })
    // 构造DataFrame的元数据
    val schema = StructType(List(
      StructField("a",StringType,true),
      StructField("b",StringType,true)
    ))

    // 构造DataFrame
    val dataFrame = spark.createDataFrame(rowRDD,schema)

    // 注册成为临时表供SQL查询操作
    dataFrame.createTempView("t2")
    val result: DataFrame = spark.sql("select * from t2")
    result.show()
  }
}

注意：我这里使用的是Ambari2.7.4+HDP3.1.4版本，正常整合之后这三行代码都不用写也可以连接成功hbase），但是一开始我启动程序报错：java.util.concurrent.ExecutionException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/hbaseid

hbase java 乱码查询 java读hbase_hadoop

原因：hbase-site.xml文件中的配置为：

<property>
      <name>zookeeper.znode.parent</name>
      <value>/hbase-unsecure</value>
    </property>

方法一：改为<value>/hbase</value>，重启hbase。

注意：zookeeper.znode.parent的值为在zookeeper中创建的目录。

hbase java 乱码查询 java读hbase_apache_02

方法二：在代码中添加config.set("zookeeper.znode.parent", "/hbase-unsecure")

补充：在hive的ods层中创建完hbase的映射表后，想通过create table as sleect ...语句在dwd层生成相应的表，但是却报错：

hbase java 乱码查询 java读hbase_hbase_03

解决方法1：启动的时候添加添加相应的参数：beeline -hiveconf zookeeper.znode.parent=/hbase-unsecure或者hive -hiveconf zookeeper.znode.parent=/hbase-unsecure解决方法2：我使用的是ambari的hdp 3.1.4版本，添加如下配置后执行beeline命令即可

hbase java 乱码查询 java读hbase_java_04

2.sparkstreaming读取kafka的数据再写入hbase：

import java.util

import com.rongrong.bigdata.utils.{KafkaZkUtils, UMSUtils}
import kafka.utils.ZkUtils
import org.apache.hadoop.hbase.client.{ConnectionFactory, Put}
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.{HBaseConfiguration, TableName}
import org.apache.kafka.clients.consumer.ConsumerConfig
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.log4j.Logger
import org.apache.spark.streaming.{Durations, StreamingContext}

import scala.util.Try

object StandardOnlie {

  private val logger: Logger = Logger.getLogger(this.getClass)

  def main(args: Array[String]): Unit = {
    val spark = InitializeSpark.createSparkSession("StandardOnlie", "local")
    val streamingContext = new StreamingContext(spark.sparkContext, Durations.seconds(30))
    val kafkaParams = Map[String, Object](
      ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> "node01:6667,node02:6667,node03:6667",
      ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> classOf[StringDeserializer],
      ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> classOf[StringDeserializer],
      ConsumerConfig.GROUP_ID_CONFIG -> "group-02",
      ConsumerConfig.AUTO_OFFSET_RESET_CONFIG -> "latest",
      ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG -> (false: java.lang.Boolean)
    )

    val topic: String = "djt_db.test_schema1.result"
    val zkUrl = "node01:2181,node02:2181,node03:2181"
    val sessionTimeout = 1000
    val connectionTimeout = 1000

    val zkClient = ZkUtils.createZkClient(zkUrl, sessionTimeout, connectionTimeout)

    val kafkaStream = KafkaZkUtils.createDirectStream(zkClient, streamingContext, kafkaParams, topic)

    // 开始处理批次消息
    kafkaStream.foreachRDD(rdd => {
      // 处理从获取 kafka 中的数据
      ("=============== Total " + rdd.count() + " events in this    batch ..")

      rdd.foreach(x => {
        val configuration = HBaseConfiguration.create()
        configuration.set("zookeeper.znode.parent", "/hbase-unsecure")
        val connection = ConnectionFactory.createConnection(configuration)

        // 获取kafka中真正的数据
          var usmString = x.value()
          val flag: Boolean = UMSUtils.isHeartbeatUms(usmString)
          if (!flag) { // 过滤掉心跳数据
            val usmActiontype = UMSUtils.getActionType(usmString)
            println(s"该条数据的类型为--->${usmActiontype}")
            println("读取kafka数据，解析正文数据：" + x.value())
            val data: util.Map[String, String] = UMSUtils.getDataFromUms(usmString)
            //获取表连接
            val table = connection.getTable(TableName.valueOf("test_schema1:t2"))
            val rowkey: String = 123456 + "_" + data.get("a")
            val put = new Put(Bytes.toBytes(rowkey))
            put.addColumn(Bytes.toBytes("F"), Bytes.toBytes("ums_active_"), Bytes.toBytes(data.get("ums_active_")))
            put.addColumn(Bytes.toBytes("F"), Bytes.toBytes("ums_id_"), Bytes.toBytes(data.get("ums_id_")))
            put.addColumn(Bytes.toBytes("F"), Bytes.toBytes("ums_ts_"), Bytes.toBytes(data.get("ums_ts_")))
            put.addColumn(Bytes.toBytes("F"), Bytes.toBytes("a"), Bytes.toBytes(data.get("a")))
            put.addColumn(Bytes.toBytes("F"), Bytes.toBytes("b"), Bytes.toBytes(data.get("b")))
            table.put(put)
            table.close()
            //将数据写入HBase，若出错关闭table
            Try(table.put(put)).getOrElse(table.close())
            //分区数据写入HBase后关闭连接
            table.close()
            println(s"解析到的数据为--->${data}")
          }
        })

        // 更新offset到zookeeper中
        KafkaZkUtils.saveOffsets(zkClient, topic, KafkaZkUtils.getZkPath(kafkaParams, topic), rdd)
      })
    })
    streamingContext.start()
    streamingContext.awaitTermination()
    streamingContext.stop()
  }
}

import kafka.utils.{ZKGroupTopicDirs, ZkUtils}
import org.I0Itec.zkclient.ZkClient
import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord}
import org.apache.kafka.common.TopicPartition
import org.apache.log4j.Logger
import org.apache.spark.rdd.RDD
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.dstream.InputDStream
import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent
import org.apache.spark.streaming.kafka010.{ConsumerStrategies, HasOffsetRanges, KafkaUtils}

object KafkaZkUtils {
  private val logger: Logger = Logger.getLogger(this.getClass)

  /**
   * 获取 consumer 在zk上的路径
   * @param kafkaParams
   * @param topic
   * @return
   */
  def getZkPath(kafkaParams: Map[String, Object], topic: String): String ={
    val topicDirs = new ZKGroupTopicDirs(kafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG).toString, topic)
    s"${topicDirs.consumerOffsetDir}"
  }

  /**
   * 创建 DirectStream
   * @param zkClient
   * @param streamingContext
   * @param kafkaParams
   * @param topic
   * @return
   */
  def createDirectStream(zkClient: ZkClient,streamingContext: StreamingContext, kafkaParams: Map[String, Object], topic: String): InputDStream[ConsumerRecord[String, String]] = {


    val zkPath = getZkPath(kafkaParams,topic)

    //读取 topic 的 offset
    val storedOffsets = readOffsets(zkClient, topic, zkPath)

    val kafkaStream: InputDStream[ConsumerRecord[String, String]] = storedOffsets match {
      //上次未保存offsets
      case None =>
        KafkaUtils.createDirectStream[String, String](
          streamingContext,
          PreferConsistent,
          ConsumerStrategies.Subscribe[String, String](Array(topic), kafkaParams)
        )
      case Some(fromOffsets) => {
        KafkaUtils.createDirectStream[String, String](
          streamingContext,
          PreferConsistent,
          // 指定分区消费，无法动态感知分区变化
          //          ConsumerStrategies.Assign[String, String](fromOffsets.keys.toList, kafkaParams, fromOffsets)
          ConsumerStrategies.Subscribe[String, String](List(topic), kafkaParams, fromOffsets)
        )
      }
    }
    kafkaStream
  }

  /**
   * 保存 offset
   * @param zkClient
   * @param topic
   * @param zkPath
   * @param rdd
   */
  def saveOffsets(zkClient: ZkClient,topic: String, zkPath: String, rdd: RDD[_]): Unit = {

    ("Saving offsets to zookeeper")

    val offsetsRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges

    offsetsRanges.foreach(offsetRange => logger.debug(s"Using ${offsetRange}"))

    val offsetsRangesStr = offsetsRanges.map(offsetRange => s"${offsetRange.partition}:${offsetRange.untilOffset}").mkString(",")

    (s"Writing offsets to Zookeeper: ${offsetsRangesStr}")

    ZkUtils(zkClient, false).updatePersistentPath(zkPath, offsetsRangesStr)
  }

  /**
   * 读取 offset
   * @param zkClient
   * @param topic
   * @param zkPath
   * @return
   */
  def readOffsets(zkClient: ZkClient, topic: String, zkPath: String): Option[Map[TopicPartition, Long]] = {
    ("Reading offsets from zookeeper")

    val (offsetsRangesStrOpt, _) = ZkUtils(zkClient, false).readDataMaybeNull(zkPath)
    offsetsRangesStrOpt match {
      case Some(offsetsRangesStr) => {
        logger.debug(s"Read offset ranges: ${
          offsetsRangesStr
        }")
        val offsets: Map[TopicPartition, Long] = offsetsRangesStr.split(",").map(s => s.split(":"))
          .map({
            case Array(partitionStr, offsetStr) =>
              (new TopicPartition(topic, partitionStr.toInt) -> offsetStr.toLong)
                // 这里可以指定offset的位置读取，注意：还需要把上面createDirectStream方法的ConsumerStrategies.Assign代码打开
//              (new TopicPartition(topic, partitionStr.toInt) -> "20229".toLong)
          }).toMap
        Some(offsets)
      }
      case None =>
        ("No offsets found in Zookeeper")
        None
    }
  }
}

本来想用foreachPartition的，但是我没成功。他这个里面在foreachPartition里面创建连接可以，并且他提了一句“获取HBase连接，分区创建一个连接，分区不跨节点，不需要序列化”，但是在我这里这样写就报错了：

hbase java 乱码查询 java读hbase_apache_05

报错代码：

rdd.foreachPartition(partitionRecords => {
        val configuration = HBaseConfiguration.create()
        configuration.set("zookeeper.znode.parent", "/hbase-unsecure")
        val connection = ConnectionFactory.createConnection(configuration) //获取HBase连接,分区创建一个连接，分区不跨节点，不需要序列化

        partitionRecords.foreach(x => {

        // 获取kafka中真正的数据
          var usmString = x.value()

查了一些资料：关于scala：通过Spark写入HBase：任务不可序列化、zookeeper报错： org.I0Itec.zkclient.exception.ZkMarshallingError: java.io.EOFException、Spark 序列化问题全解、HBase连接池，目前也还没有找到解决方法，有知道的人可以探讨一下。

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。