hdfs数据库下表如何区分是不是分区表

转载

mob64ca13fa2f9e 2024-10-12 12:13:07

文章标签 hdfs数据库下表如何区分是不是分区表大数据 hadoop Hbase apache 文章分类 架构后端开发

HBase：给予HDFS的NpSQL数据库

1、什么是NoSQL和常见的NoSQL

1）基于Key-Value保存数据

2）NoSQL数据库不支持事务（Transaction）

3）常见的NoSQL数据库

（*）HBase：基于HDFS之上的NoSQL数据库，面向列

（*）Redis：基于内存的NoSQL数据库

（*）Cassandra：类似HBase，面向列的NoSQL

（*）MongoDB：文档（Bson文档）型NoSQL数据库Bson就是json的二进

hdfs数据库下表如何区分是不是分区表_Hbase

2、体系结构和表结构

hdfs数据库下表如何区分是不是分区表_大数据_02

1、安装配置：三种模式

前提：装好Hadoop

解压和设置环境变量：tar -zxvf hbase-0.96.2-hadoop2-bin.tar.gz -C~/training/

HBASE_HOME=/root/training/hbase-0.96.2-hadoop2
                            export HBASE_HOME
 
                            PATH=$HBASE_HOME/bin:$PATH
                            export PATH
              
              
              1、本地模式：不需要Hadoop的支持     ----->  hadoop111
                     hbase-env.sh
                                   exportJAVA_HOME=/root/training/jdk1.7.0_75
                                   
                     hbase-site.xml
                                   <property>
                                     <name>hbase.rootdir</name>
                                    <value>file:///root/training/hbase-0.96.2-hadoop2/data</value>
                                   </property>         
 
                     启动:start-hbase.sh
              
              
              2、伪分布模式：在单机上模拟一个分布式的环境-----> hadoop111
                     hbase-env.sh:
                            exportHBASE_MANAGES_ZK=true
 
                     hbase-site.xml:
                            <property>
                              <name>hbase.rootdir</name>
                             <value>hdfs://192.168.157.111:9000/hbase</value>
                            </property>
 
                            <property>
                             <name>hbase.cluster.distributed</name>
                              <value>true</value>
                            </property>
 
                            <property>
                             <name>hbase.zookeeper.quorum</name>
                              <value>192.168.157.111</value>
                            </property>
 
                            <property>
                              <name>dfs.replication</name>
                              <value>1</value>
                            </property>         
 
                     HBase Web Console: 端口60010
              
              
              3、全分布模式：三台   ----->hadoop112，hadoop113，hadoop114
                            <property>
                              <name>hbase.rootdir</name>
                             <value>hdfs://192.168.157.112:9000/hbase</value>
                            </property>
 
                            <property>
                             <name>hbase.cluster.distributed</name>
                              <value>true</value>
                            </property>
 
                            <property>
                             <name>hbase.zookeeper.quorum</name>
                              <value>192.168.157.112</value>
                            </property>
 
                            <property>
                              <name>dfs.replication</name>
                              <value>2</value>
                            </property>  
 
                            <property>
                              <name>hbase.master.maxclockskew</name>
                              <value>180000</value>
                            </property>         
                            
                     regionservers从节点位置
                            192.168.157.113
                            192.168.157.114
 
 
                     把安装好的目录复制到从节点
                     scp -rhbase-0.96.2-hadoop2/ root@hadoop113:/root/training
                     scp -rhbase-0.96.2-hadoop2/ root@hadoop114:/root/training

4、实现HBase的HA：-----> hadoop112，hadoop113，hadoop114

直接在另一个节点上，启动hmaster

hdfs数据库下表如何区分是不是分区表_大数据_03

4、命令行：hbase shell 、JavaAPI

1、命令行

（1）创建表：表名、列族的名字 -----> HDFS目录

create 'student','info','grade'

create'student',{NAME=>'info',VERSIONS=>'3'}

查看表的结构：describe 'student'

hdfs数据库下表如何区分是不是分区表_hdfs数据库下表如何区分是不是分区表_04

（2）插入数据: put 表名,行键, 列（列族的名字+列的名字）,值

put'student','stu001','info:name','Tom'
                            put'student','stu001','info:age','24'
                            put'student','stu001','grade:chinese','80'
                            put'student','stu002','info:name','Mary'

（3）查询数据：

（*）scan: 相当于 select * from emp;

scan 'student'

（*）get：相当于 select * from *** whererowkey=???

格式： get 表名,rowkey

get 'student','stu001'

（4） count 'student'

（5）清空表： truncate 表名

（*）补充：Oracle中，清空表的数据：

delete from table;

truncate table ****;

区别：

1、delete是DML（Data Manipulation Language 数据操作语言）：可以回滚

truncate是DDL（Data Definition Language 数据定义语言）: create/drop table 不可以回滚（DDL是隐式提交）

2、delete逐条删除；truncate先摧毁表，再重建表

3、delete会产生碎片；truncate不会产生碎片

4、delete可以闪回；truncate不可以闪回(flashback)

（*）hbase(main):010:0> truncate 'student'
                            Truncating 'student'table (it may take a while):
                             - Disabling table...
                             - Dropping table...
                             - Creating table...
                            0 row(s) in 2.1140seconds

（6）删除表：drop 'student'

先：disable 'student'

2、Java API: 依赖的jar /root/training/hbase-0.96.2-hadoop2/lib

package demo; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HColumnDescriptor; import org.apache.hadoop.hbase.HTableDescriptor; import org.apache.hadoop.hbase.TableName; import org.apache.hadoop.hbase.client.Get; import org.apache.hadoop.hbase.client.HBaseAdmin; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.client.ResultScanner; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.util.Bytes; import org.junit.Test; public class TestDemoHBase { @Test public void testCreateTable() throws Exception{ //获取HMaster的地址信息，配置ZK的地址 Configuration conf = new Configuration(); conf.set("hbase.zookeeper.quorum", "192.168.157.111"); //获取一个HBase的客户端：HBaseAdmin HBaseAdmin admin = new HBaseAdmin(conf); //创建表 //表名 HTableDescriptor ht = new HTableDescriptor(TableName.valueOf("mystudent")); //创建列族 HColumnDescriptor hc1 = new HColumnDescriptor("info"); HColumnDescriptor hc2 = new HColumnDescriptor("grade"); //将列族加入表 ht.addFamily(hc1); ht.addFamily(hc2); //创建表 admin.createTable(ht); //关闭 admin.close(); } @Test public void testPut() throws Exception{ //获取HMaster的地址信息，配置ZK的地址 Configuration conf = new Configuration(); conf.set("hbase.zookeeper.quorum", "192.168.157.111"); //获取HBase客户端：HTable HTable table = new HTable(conf, "mystudent"); //构造一个Put对象,传递一个rowkey Put put = new Put(Bytes.toBytes("stu001")); // put.add(family, 列族的名字 // qualifier, 列的名字 // value) 值 put.add(Bytes.toBytes("info"), Bytes.toBytes("name"), Bytes.toBytes("Tom")); //插入数据 table.put(put); table.close(); } @Test public void testPutList() throws Exception{ //相当于： insert into ** select ** //作业 } @Test public void testGet() throws Exception{ //获取HMaster的地址信息，配置ZK的地址 Configuration conf = new Configuration(); conf.set("hbase.zookeeper.quorum", "192.168.157.111"); //获取HBase客户端：HTable HTable table = new HTable(conf, "mystudent"); //构造一个Get对象，传递rowkey Get get = new Get(Bytes.toBytes("stu001")); //查询 Result r = table.get(get); //取出这条记录的name和age String name = Bytes.toString(r.getValue(Bytes.toBytes("info"), Bytes.toBytes("name"))); String age = Bytes.toString(r.getValue(Bytes.toBytes("info"), Bytes.toBytes("age"))); //打印 System.out.println(name +"\t"+ age); table.close(); } @Test public void testScan() throws Exception{ //获取HMaster的地址信息，配置ZK的地址 Configuration conf = new Configuration(); conf.set("hbase.zookeeper.quorum", "192.168.157.111"); //获取HBase客户端：HTable HTable table = new HTable(conf, "mystudent"); //定义一个扫描器 Scan scan = new Scan(); //scan.setFilter(filter) ----> 定义一个过滤器： where条件 ResultScanner rs = table.getScanner(scan); for(Result r: rs){ //取出名字 String name = Bytes.toString(r.getValue(Bytes.toBytes("info"), Bytes.toBytes("name"))); System.out.println(name); } table.close(); } @Test public void testDropTable() throws Exception{ //获取HMaster的地址信息，配置ZK的地址 Configuration conf = new Configuration(); conf.set("hbase.zookeeper.quorum", "192.168.157.111"); //获取一个HBase的客户端：HBaseAdmin HBaseAdmin admin = new HBaseAdmin(conf); //删除表 admin.disableTable("mystudent"); admin.deleteTable("mystudent"); admin.close(); } }

package demo;
 
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.Test;
 
public class TestDemoHBase {
 
       @Test
       public void testCreateTable() throws Exception{
              //获取HMaster的地址信息，配置ZK的地址
              Configuration conf = new Configuration();
              conf.set("hbase.zookeeper.quorum", "192.168.157.111");
              
              //获取一个HBase的客户端：HBaseAdmin
              HBaseAdmin admin = new HBaseAdmin(conf);
              
              //创建表
              //表名
              HTableDescriptor ht = new HTableDescriptor(TableName.valueOf("mystudent"));
              
              //创建列族
              HColumnDescriptor hc1 = new HColumnDescriptor("info");
              HColumnDescriptor hc2 = new HColumnDescriptor("grade");
              
              //将列族加入表
              ht.addFamily(hc1);
              ht.addFamily(hc2);
              
              //创建表
              admin.createTable(ht);
              
              //关闭
              admin.close();
       }
       
       @Test
       public void testPut() throws Exception{
              //获取HMaster的地址信息，配置ZK的地址
              Configuration conf = new Configuration();
              conf.set("hbase.zookeeper.quorum", "192.168.157.111");
              
              //获取HBase客户端：HTable
              HTable table = new HTable(conf, "mystudent");
              
              //构造一个Put对象,传递一个rowkey
              Put put = new Put(Bytes.toBytes("stu001"));
//           put.add(family,  列族的名字
//                         qualifier, 列的名字
//                         value)  值
              put.add(Bytes.toBytes("info"), Bytes.toBytes("name"), Bytes.toBytes("Tom"));
              
              //插入数据
              table.put(put);
              
              table.close();
       }
       
       @Test
       public void testPutList() throws Exception{
              //相当于： insert into **** select ****
              //作业
       }
       
       @Test
       public void testGet() throws Exception{
              //获取HMaster的地址信息，配置ZK的地址
              Configuration conf = new Configuration();
              conf.set("hbase.zookeeper.quorum", "192.168.157.111");
              
              //获取HBase客户端：HTable
              HTable table = new HTable(conf, "mystudent");
              
              //构造一个Get对象，传递rowkey
              Get get = new Get(Bytes.toBytes("stu001"));
              
              //查询
              Result r = table.get(get);
              
              //取出这条记录的name和age
              String name = Bytes.toString(r.getValue(Bytes.toBytes("info"), Bytes.toBytes("name")));
              String age = Bytes.toString(r.getValue(Bytes.toBytes("info"), Bytes.toBytes("age")));
              
              //打印
              System.out.println(name +"\t"+ age);
              
              table.close();
       }
       
       
       @Test
       public void testScan() throws Exception{
              //获取HMaster的地址信息，配置ZK的地址
              Configuration conf = new Configuration();
              conf.set("hbase.zookeeper.quorum", "192.168.157.111");
              
              //获取HBase客户端：HTable
              HTable table = new HTable(conf, "mystudent");
              
              //定义一个扫描器
              Scan scan = new Scan();
              //scan.setFilter(filter) ----> 定义一个过滤器： where条件
              
           ResultScanner rs = table.getScanner(scan);
           for(Result r: rs){
                 //取出名字
                 String name = Bytes.toString(r.getValue(Bytes.toBytes("info"), Bytes.toBytes("name")));
                 System.out.println(name);
           }
           
           table.close();
       }
       
       @Test
       public void testDropTable() throws Exception{
              //获取HMaster的地址信息，配置ZK的地址
              Configuration conf = new Configuration();
              conf.set("hbase.zookeeper.quorum", "192.168.157.111");
              
              //获取一个HBase的客户端：HBaseAdmin
              HBaseAdmin admin = new HBaseAdmin(conf);
              
              //删除表
              admin.disableTable("mystudent");
              admin.deleteTable("mystudent");
              
              admin.close();
       }
}

1、Web Console：端口: 60010

5、数据保存的过程------> 问题：Region的分裂

hdfs数据库下表如何区分是不是分区表_大数据_05

hdfs数据库下表如何区分是不是分区表_hdfs数据库下表如何区分是不是分区表_06

6、HBase过滤器（filter）：Java程序

1、准备测试数据

2、类型

（*）列值过滤器：SingleColumnValueFilter

举例：查询工资等于3000的员工姓名 select enamefrom emp where sal=3000;

（*）列名前缀过滤器：ColumnPrefixFilter

举例：查询员工姓名 select enamefrom emp;

（*）多个列名前缀过滤器: MultipleColumnPrefixFilter

举例：查询员工姓名和薪水： select ename,sal from emp;

（*）rowkey 过滤器: 查询员工号等于7839的员工

select * from emp whereempno=7839;

（*）组合多个过滤器

package demo.filter; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.client.ResultScanner; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.filter.ColumnPrefixFilter; import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp; import org.apache.hadoop.hbase.filter.FilterList; import org.apache.hadoop.hbase.filter.FilterList.Operator; import org.apache.hadoop.hbase.filter.MultipleColumnPrefixFilter; import org.apache.hadoop.hbase.filter.RegexStringComparator; import org.apache.hadoop.hbase.filter.RowFilter; import org.apache.hadoop.hbase.filter.SingleColumnValueFilter; import org.apache.hadoop.hbase.util.Bytes; import org.junit.Test; public class TestHBaseFilter { @Test public void testSingleColumnValueFilter() throws Exception{ //指定的配置信息: ZooKeeper Configuration conf = new Configuration(); conf.set("hbase.zookeeper.quorum", "192.168.157.111"); //客户端 HTable table = new HTable(conf, "emp"); //创建一个扫描器 Scan scan = new Scan(); //创建列值过滤器 SingleColumnValueFilter filter = new SingleColumnValueFilter(Bytes.toBytes("empinfo"), //列族名字 Bytes.toBytes("sal"), //列的名字 CompareOp.EQUAL, //枚举类，代表比较运算符 Bytes.toBytes("3000")); //将过滤器加入扫描器 scan.setFilter(filter); //执行查询 ResultScanner rs = table.getScanner(scan); for(Result r:rs){ //打印名字 System.out.println(Bytes.toString(r.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("ename")))); } table.close(); } @Test public void testColumnPrefixFilter() throws Exception{ //指定的配置信息: ZooKeeper Configuration conf = new Configuration(); conf.set("hbase.zookeeper.quorum", "192.168.157.111"); //客户端 HTable table = new HTable(conf, "emp"); //创建一个扫描器 Scan scan = new Scan(); //创建列名前缀过滤器 ColumnPrefixFilter filter = new ColumnPrefixFilter(Bytes.toBytes("ename")); scan.setFilter(filter); //执行查询 ResultScanner rs = table.getScanner(scan); for(Result r:rs){ //打印名字 System.out.println(Bytes.toString(r.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("ename")))); } table.close(); } @Test public void testMultipleColumnPrefixFilter() throws Exception{ //指定的配置信息: ZooKeeper Configuration conf = new Configuration(); conf.set("hbase.zookeeper.quorum", "192.168.157.111"); //客户端 HTable table = new HTable(conf, "emp"); //创建一个扫描器 Scan scan = new Scan(); //创建多个列名前缀过滤器: 查询员工姓名和薪水 //构造一个二维数据 byte[][] namesList = {Bytes.toBytes("ename"),Bytes.toBytes("sal")}; MultipleColumnPrefixFilter filter = new MultipleColumnPrefixFilter(namesList); scan.setFilter(filter); //执行查询 ResultScanner rs = table.getScanner(scan); for(Result r:rs){ String ename = Bytes.toString(r.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("ename"))); String sal = Bytes.toString(r.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("sal"))); //打印名字 System.out.println(ename+"\t"+sal); } table.close(); } @Test public void testRowFilter() throws Exception{ //指定的配置信息: ZooKeeper Configuration conf = new Configuration(); conf.set("hbase.zookeeper.quorum", "192.168.157.111"); //客户端 HTable table = new HTable(conf, "emp"); //创建一个扫描器 Scan scan = new Scan(); //创建一个RowFilter RowFilter filter = new RowFilter(CompareOp.EQUAL, //比较规则，比较运算符 new RegexStringComparator("7839")); //rowkey值：采用正则表达式 scan.setFilter(filter); //执行查询 ResultScanner rs = table.getScanner(scan); for(Result r:rs){ String ename = Bytes.toString(r.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("ename"))); String sal = Bytes.toString(r.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("sal"))); //打印名字 System.out.println(ename+"\t"+sal); } table.close(); } @Test public void test5() throws Exception{ //查询薪水等于3000的员工姓名 /* * 使用两个过滤器 * 1、列值过滤器：薪水等于3000的员工 * 2、列名前缀过滤器：员工姓名 */ //指定的配置信息: ZooKeeper Configuration conf = new Configuration(); conf.set("hbase.zookeeper.quorum", "192.168.157.111"); //客户端 HTable table = new HTable(conf, "emp"); //创建一个扫描器 Scan scan = new Scan(); //创建第一个过滤器 SingleColumnValueFilter filter1 = new SingleColumnValueFilter(Bytes.toBytes("empinfo"), //列族名字 Bytes.toBytes("sal"), //列的名字 CompareOp.EQUAL, //枚举类，代表比较运算符 Bytes.toBytes("3000")); //创建第二个过滤器 ColumnPrefixFilter filter2 = new ColumnPrefixFilter(Bytes.toBytes("ename")); //创建一个FilterList FilterList list = new FilterList(Operator.MUST_PASS_ALL); //相当于and list.addFilter(filter1); list.addFilter(filter2); //将两个过滤器加入扫描器 scan.setFilter(list); //执行查询 ResultScanner rs = table.getScanner(scan); for(Result r:rs){ String ename = Bytes.toString(r.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("ename"))); String sal = Bytes.toString(r.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("sal"))); //打印名字 System.out.println(ename+"\t"+sal); } table.close(); } }

package demo.filter;
 
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.filter.ColumnPrefixFilter;
import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;
import org.apache.hadoop.hbase.filter.FilterList;
import org.apache.hadoop.hbase.filter.FilterList.Operator;
import org.apache.hadoop.hbase.filter.MultipleColumnPrefixFilter;
import org.apache.hadoop.hbase.filter.RegexStringComparator;
import org.apache.hadoop.hbase.filter.RowFilter;
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.Test;
 
public class TestHBaseFilter {
 
       @Test
       public void testSingleColumnValueFilter() throws Exception{
              //指定的配置信息: ZooKeeper
              Configuration conf = new Configuration();
              conf.set("hbase.zookeeper.quorum", "192.168.157.111");
              
              //客户端
              HTable table = new HTable(conf, "emp");           
              
              //创建一个扫描器
              Scan scan = new Scan();
              
              //创建列值过滤器
              SingleColumnValueFilter filter = new SingleColumnValueFilter(Bytes.toBytes("empinfo"),     //列族名字
                                                                                                                        Bytes.toBytes("sal"), //列的名字
                                                                                 CompareOp.EQUAL,  //枚举类，代表比较运算符
                                                                                 Bytes.toBytes("3000"));
              
              //将过滤器加入扫描器
              scan.setFilter(filter);
              
              //执行查询
        ResultScanner rs = table.getScanner(scan);
        for(Result r:rs){
             //打印名字
             System.out.println(Bytes.toString(r.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("ename"))));
        }
        
        table.close();
       }
       
       @Test
       public void testColumnPrefixFilter() throws Exception{
              //指定的配置信息: ZooKeeper
              Configuration conf = new Configuration();
              conf.set("hbase.zookeeper.quorum", "192.168.157.111");
              
              //客户端
              HTable table = new HTable(conf, "emp");           
              
              //创建一个扫描器
              Scan scan = new Scan();
              
              //创建列名前缀过滤器
              ColumnPrefixFilter filter = new ColumnPrefixFilter(Bytes.toBytes("ename"));
              
              scan.setFilter(filter);
              
              //执行查询
        ResultScanner rs = table.getScanner(scan);
        for(Result r:rs){
             //打印名字
             System.out.println(Bytes.toString(r.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("ename"))));
        }
        
        table.close();
       }
       
       @Test
       public void testMultipleColumnPrefixFilter() throws Exception{
              //指定的配置信息: ZooKeeper
              Configuration conf = new Configuration();
              conf.set("hbase.zookeeper.quorum", "192.168.157.111");
              
              //客户端
              HTable table = new HTable(conf, "emp");           
              
              //创建一个扫描器
              Scan scan = new Scan();
              
              //创建多个列名前缀过滤器: 查询员工姓名和薪水
              //构造一个二维数据
              byte[][] namesList = {Bytes.toBytes("ename"),Bytes.toBytes("sal")};
              
              MultipleColumnPrefixFilter filter = new MultipleColumnPrefixFilter(namesList);
              scan.setFilter(filter);
              
              //执行查询
        ResultScanner rs = table.getScanner(scan);
        for(Result r:rs){
             String ename = Bytes.toString(r.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("ename")));
             String sal = Bytes.toString(r.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("sal")));
             
             //打印名字
             System.out.println(ename+"\t"+sal);
        }
        
        table.close();            
       }
       
       @Test
       public void testRowFilter() throws Exception{
              //指定的配置信息: ZooKeeper
              Configuration conf = new Configuration();
              conf.set("hbase.zookeeper.quorum", "192.168.157.111");
              
              //客户端
              HTable table = new HTable(conf, "emp");    
              
              //创建一个扫描器
              Scan scan = new Scan();
              //创建一个RowFilter
              RowFilter filter = new RowFilter(CompareOp.EQUAL,  //比较规则，比较运算符
                                                     new RegexStringComparator("7839")); //rowkey值：采用正则表达式
              
              scan.setFilter(filter);
              
              //执行查询
        ResultScanner rs = table.getScanner(scan);
        for(Result r:rs){
             String ename = Bytes.toString(r.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("ename")));
             String sal = Bytes.toString(r.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("sal")));
             
             //打印名字
             System.out.println(ename+"\t"+sal);
        }
        
        table.close();            
       }
       
       @Test
       public void test5() throws Exception{
              //查询薪水等于3000的员工姓名
              /*
               * 使用两个过滤器
               * 1、列值过滤器：薪水等于3000的员工
               * 2、列名前缀过滤器：员工姓名
               */
              
              //指定的配置信息: ZooKeeper
              Configuration conf = new Configuration();
              conf.set("hbase.zookeeper.quorum", "192.168.157.111");
              
              //客户端
              HTable table = new HTable(conf, "emp");    
              
              //创建一个扫描器
              Scan scan = new Scan();
              
              //创建第一个过滤器
              SingleColumnValueFilter filter1 = new SingleColumnValueFilter(Bytes.toBytes("empinfo"),     //列族名字
                                                                                     Bytes.toBytes("sal"), //列的名字
                                                                         CompareOp.EQUAL,  //枚举类，代表比较运算符
                                                                         Bytes.toBytes("3000"));
              
              //创建第二个过滤器
              ColumnPrefixFilter filter2 = new ColumnPrefixFilter(Bytes.toBytes("ename"));
              
              //创建一个FilterList
              FilterList list = new FilterList(Operator.MUST_PASS_ALL); //相当于and
              list.addFilter(filter1);
              list.addFilter(filter2);
              
              
              //将两个过滤器加入扫描器
              scan.setFilter(list);
              
              //执行查询
        ResultScanner rs = table.getScanner(scan);
        for(Result r:rs){
             String ename = Bytes.toString(r.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("ename")));
             String sal = Bytes.toString(r.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("sal")));
             
             //打印名字
             System.out.println(ename+"\t"+sal);
        }
        
        table.close();                          
              
       }
}

7、HBase上的MapReduce程序

package demo.wc; import java.io.IOException; import org.apache.hadoop.hbase.client.Mutation; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.io.ImmutableBytesWritable; import org.apache.hadoop.hbase.mapreduce.TableReducer; import org.apache.hadoop.hbase.util.Bytes; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; // k3 v3 Reduce的输出：一条记录 public class WordCountReducer extends TableReducer<Text, LongWritable, ImmutableBytesWritable> { @Override protected void reduce(Text k3, Iterable<LongWritable> v3,Context context) throws IOException, InterruptedException { // 求和 long total = 0; for(LongWritable l:v3){ total = total + l.get(); } //输出结果：是表中的一条记录 //构造一个Put对象: 使用单词k3作为rowkey Put put = new Put(Bytes.toBytes(k3.toString())); put.add(Bytes.toBytes("content"), Bytes.toBytes("result"), Bytes.toBytes(String.valueOf(total))); //输出 context.write(new ImmutableBytesWritable(Bytes.toBytes(k3.toString())), //插入数据的时候，rowkey是多少 put);//数据 } }
package demo.wc; import java.io.IOException; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.io.ImmutableBytesWritable; import org.apache.hadoop.hbase.mapreduce.TableMapper; import org.apache.hadoop.hbase.util.Bytes; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; //没有输入k1和v1，现在的输入就是表中的一条记录 // k2 v2 public class WordCountMapper extends TableMapper<Text, LongWritable> { @Override protected void map(ImmutableBytesWritable key, Result value,Context context) throws IOException, InterruptedException { //输入的就是表中的一条记录 //key : 记录的rowkey //value: 输入的记录 //取出数据 String str = Bytes.toString(value.getValue(Bytes.toBytes("content"), Bytes.toBytes("info"))); //分词 String[] words = str.split(" "); //输出 for(String w: words){ context.write(new Text(w), new LongWritable(1)); } } }
package demo.wc; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil; import org.apache.hadoop.hbase.util.Bytes; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; public class WordCountMain { public static void main(String[] args) throws Exception { //指定的配置信息: ZooKeeper Configuration conf = new Configuration(); conf.set("hbase.zookeeper.quorum", "192.168.157.111"); //创建一个job Job job = Job.getInstance(conf); job.setJarByClass(WordCountMain.class); //定义一个扫描器：只读入需要处理的数据 Scan scan = new Scan(); scan.addColumn(Bytes.toBytes("content"), Bytes.toBytes("info"));//指定要读取的列 //指定任务的Mapper //TableMapReduceUtil.initTableMapperJob(table, scan, mapper, outputKeyClass, outputValueClass, job); TableMapReduceUtil.initTableMapperJob(Bytes.toBytes("word"), //输入的表 scan, //扫描器 WordCountMapper.class, Text.class, LongWritable.class, job ); //指定任务的Reducer 输出的表名 TableMapReduceUtil.initTableReducerJob("stat", WordCountReducer.class, job); //执行 job.waitForCompletion(true); } }

package demo.wc;
 
import java.io.IOException;
 
import org.apache.hadoop.hbase.client.Mutation;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
 
//                                                   k3    v3      Reduce的输出：一条记录
public class WordCountReducer extends TableReducer<Text, LongWritable, ImmutableBytesWritable> {
 
         @Override
         protected void reduce(Text k3, Iterable<LongWritable> v3,Context context)
                            throws IOException, InterruptedException {
                   // 求和
                   long total = 0;
                   for(LongWritable l:v3){
                            total = total + l.get();
                   }
                   
                   //输出结果：是表中的一条记录
                   //构造一个Put对象: 使用单词k3作为rowkey
                   Put put = new Put(Bytes.toBytes(k3.toString()));
                   
                   put.add(Bytes.toBytes("content"), Bytes.toBytes("result"), Bytes.toBytes(String.valueOf(total)));
                   
                   //输出
                   context.write(new ImmutableBytesWritable(Bytes.toBytes(k3.toString())),  //插入数据的时候，rowkey是多少
                                           put);//数据
         }
 
}

package demo.wc; 
 
import java.io.IOException;
 
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
 
//没有输入k1和v1，现在的输入就是表中的一条记录
//                                                k2       v2
public class WordCountMapper extends TableMapper<Text, LongWritable> {
 
         @Override
         protected void map(ImmutableBytesWritable key, Result value,Context context)
                            throws IOException, InterruptedException {
                   //输入的就是表中的一条记录
                   //key : 记录的rowkey
                   //value: 输入的记录
                   //取出数据
                   String str = Bytes.toString(value.getValue(Bytes.toBytes("content"), Bytes.toBytes("info")));
                   
                   //分词
                   String[] words = str.split(" ");
                   
                   //输出
                   for(String w: words){
                            context.write(new Text(w), new LongWritable(1));
                   }
         }
 
}

package demo.wc;
 
import java.io.IOException;
 
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
 
public class WordCountMain {
 
         public static void main(String[] args) throws Exception {
                   //指定的配置信息: ZooKeeper
                   Configuration conf = new Configuration();
                   conf.set("hbase.zookeeper.quorum", "192.168.157.111");
                   
                   //创建一个job
                   Job job = Job.getInstance(conf);
                   job.setJarByClass(WordCountMain.class);
                   
                   //定义一个扫描器：只读入需要处理的数据
                   Scan scan = new Scan();
                   scan.addColumn(Bytes.toBytes("content"), Bytes.toBytes("info"));//指定要读取的列
                                     
                   //指定任务的Mapper
                   //TableMapReduceUtil.initTableMapperJob(table, scan, mapper, outputKeyClass, outputValueClass, job);
                   TableMapReduceUtil.initTableMapperJob(Bytes.toBytes("word"),  //输入的表
                                                                   scan,                   //扫描器
                                                                   WordCountMapper.class,
                                                                   Text.class,
                                                                   LongWritable.class,
                                                                   job
                                                                 );
                   
                   //指定任务的Reducer                       输出的表名
                   TableMapReduceUtil.initTableReducerJob("stat", WordCountReducer.class, job);
                   
                   //执行
                   job.waitForCompletion(true);
         }
 
}

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：hadoop MapReduce 调用架构

下一篇：springcloud 转义请求

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯