- 这篇在学习之前,需要搭建好HBase集群,当你的集群搭建好了,那么就开始学习HBase的shell操作把~
- 这篇文章将只会介绍使用频率最高的shell命令,这些没有什么难度,只是一些熟练问题,我的HBase版本是2.1.1
- 进入HBase的命令行的命令是
HBase shell - 查看HBase shell中命令帮助的命令是
help 'xxx'
create建表
- 前一篇文章说过了,建表必须指定列族这件事是不能忘的,比如创建一个名为test的表,表中有一个列族名为cf
hbase(main):029:0> create 'test','cf'
Created table test
Took 1.2710 seconds
=> Hbase::Table - test- 所以也证实了之前说的必须有列族的指定,如果不加会报错的,列是依附于列族上的

- 以上创建方式只是指定了列族而没有指定列族内的列,为什么不用指定列呢?
- HBase不像RDBMS一样在建表的时候就必须指定列,因为RDBMS的数据需要有地方放,如果RDBMS不指定列,那么他一个表还有什么?他的数据往哪里放?但是在HBASE中列是相当灵活的,如果你现在不懂什么意思也没关系,下面shell操作会说明这一切.HBase中的列只有在你需要插入数据的时候才会生成,不过确切地说不能叫“生成”,因为并没 有生成列定义之类的操作(意思是如果你建表,会有建表的操作以及表的定义但是列是没有的).你只是向HBase中插入了一个单元格,而这 个单元格是由表:列族:行:列来定位的,列名就成为了cell的属性名,这才让这行数据有了一个列,而别的行有没有这一列,HBase只有遍历的时候才会知道,如果还是不知道怎么回事,下面在介绍put的时候我会画一张图说明一下
list查看库中表
hbase(main):031:0> list
TABLE
test #只有test表
1 row(s)
Took 0.0354 seconds
=> ["test"]describe查看表属性
- 查看test表的属性
hbase(main):032:0> desc 'test'
Table test is ENABLED
test
COLUMN FAMILIES DESCRIPTION
{
NAME => 'cf', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false',
NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE',
CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE',
TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0',
BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false',
IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false',
PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE',
BLOCKCACHE => 'true', BLOCKSIZE => '65536'
}
1 row(s)
Took 0.1347 seconds- 用describe和desc效果是一样的
- 观察上面的输出NAME只是列族名而不是表名,并且后面的全部属性都是针对这个cf列族的,为了说明,我们将增加一个列族
hbase(main):034:0> alter 'test','cf2'
Updating all regions with the new schema...
1/1 regions updated. #更新Region,因为Region是按行来存储的,现在一行的结构发生了变化
Done.
Took 2.6644 seconds- 再次来看表属性
hbase(main):035:0> desc 'test'
Table test is ENABLED
test
COLUMN FAMILIES DESCRIPTION
{
NAME => 'cf', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false',
NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE',
CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE',
TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0',
BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false',
COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'
}
{
NAME => 'cf2', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false',
NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE',
CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE',
TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0',
BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false',
COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE =>'65536'
}
2 row(s)
Took 0.0913 seconds- 果然是描述列族的
put添加数据
- 我们在HBase中的cf列族内增加指定列
hbase(main):036:0> put 'test','row1','cf:name','wangziqiang'
Took 0.2273 seconds
hbase(main):037:0> put 'test','row1','cf:age',20
Took 0.0156 seconds
hbase(main):038:0> put 'test','row1','cf:height',183
Took 0.0154 seconds- shell说明:在test表中,插入一行数据,rowkey为row1,这一行的cf列族内添加单元格的列名为name,age,height,数据分别为wangziqiang,20,183
- 可以使用scan扫描表中数据
hbase(main):039:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:age, timestamp=1543664259164, value=20
row1 column=cf:height, timestamp=1543664308514, value=183
row1 column=cf:name, timestamp=1543664222231, value=wangziqiang
1 row(s)
Took 0.0435 seconds- 到这我们就知道了之前说的列增加是相当灵活是怎么一回事了,下面是数据各部分的定义
# rowkey 列族:列名 时间戳 值
row1 column=cf:name, timestamp=1543664222231, value=wangziqiang- 关于时间戳:如果你不指定的话就像刚才我们那种put使用方法,那么系统会以插入时间的时间戳为准作为其值,当然HBase也可以支持你自己定义timestamp的值,任意的都可以123,321...,HBase当然也是喜新厌旧的,它会展示最新的timestamp的数据
- 那我们之前说的,当增加数据的时候rowkey如果重复那么其值就会做更新操作是真的吗?
hbase(main):043:0> put 'test','row1','cf:age',18
Took 0.0139 seconds
hbase(main):044:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:age, timestamp=1543664971060, value=18 #发生变化
row1 column=cf:height, timestamp=1543664308514, value=183
row1 column=cf:name, timestamp=1543664222231, value=wangziqiang
1 row(s)
Took 0.0520 seconds- 确实是如我们所说,同时它也更新了timestamp的值,那么之前说的被更新掉的值没有被删掉是真的吗?
- 要想查验这个,我们就必须更改表的属性了,我们看到刚才用desc查看表属性中有一个属性名为VERSIONS.他代表了你的表为你保留几个被更新掉的数据,默认的为1,所以如果我们想要看到历史记录,就需要修改这个信息
hbase(main):047:0> alter 'test',{NAME=>'cf',VERSIONS=>5}
Updating all regions with the new schema...
1/1 regions updated.
Done.
Took 3.5151 seconds- 注意符号是
=>,并不是=,当我们再次desc查看属性的时候,我们发现该属性已经变了,所以现在我们将多put几次数据,查看一下效果
hbase(main):059:0> put 'test','row1','cf:age',17
Took 0.0338 seconds
hbase(main):060:0> put 'test','row1','cf:age',16
Took 0.0082 seconds
hbase(main):061:0> put 'test','row1','cf:age',15
Took 0.0175 seconds
hbase(main):062:0> put 'test','row1','cf:age',20
Took 0.0152 seconds
hbase(main):063:0> get 'test','row1',{COLUMN=>'cf:age',VERSIONS=>5}
COLUMN CELL
cf:age timestamp=1543665590821, value=20
cf:age timestamp=1543665587740, value=15
cf:age timestamp=1543665582866, value=16
cf:age timestamp=1543665580576, value=17
cf:age timestamp=1543664971060, value=18
1 row(s)
Took 0.0570 seconds- 看来之前说的都是对的哈哈,对于get命令接下来会说到的,现在的意思就是取你age单元格的五条历史信息,当然这个数字如果超过属性VERSIONS定义的,也是以属性VERSIONS数量为准
- 好了回过头来解决之前的一个问题:为什么不能叫生成列?
- 对于表的生成不管是HBase还是RDBMS中,都有这个表结构的定义,那么我们上面已经学会简单的使用put存放数据了,那为什么列不能确切的定义为生成呢?我们知道列族是对于整个表起作用的,但是列族下的列对于每个行来说是不同的,如图

- shell操作证明图片是正确的
hbase(main):069:0> put 'test','row2','cf:phone',1348888888
Took 0.0445 seconds
hbase(main):071:0> put 'test','row2','cf:addr','beijing'
Took 0.0256 seconds
hbase(main):072:0> put 'test','row3','cf:id',132
Took 0.0269 seconds
hbase(main):073:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:age, timestamp=1543665757228, value=22
row1 column=cf:height, timestamp=1543664308514, value=183
row1 column=cf:name, timestamp=1543664222231, value=wangziqiang
row2 column=cf:addr, timestamp=1543669367525, value=beijing
row2 column=cf:phone, timestamp=1543669351162, value=1348888888
row3 column=cf:id, timestamp=1543669389010, value=132
3 row(s)
Took 0.0428 seconds- 看到这我们就知道了,这个列为啥不能说叫生成了,因为他在表中并没有结构的定义,每一行都不尽相同,HBase并不知道每一行数据列有什么差距,只能是在扫描表的时候,他才会知道,并且这个列名,其实就是单元格cell的名字(我的理解)
scan扫描表
- 之前已经简单的使用过了scan来查看表的数据了,但是HBase在使用过程中,表数据会相当庞大,所以不应该直接使用scan扫描整个表,而是指定扫描范围
- 扫描范围可以指定开始行和结束行,时间戳范围,指定列都可以,具体的可以查看HBase的命令帮助,下面我们将使用开始行和结束行进行限制扫描,以及使用时间戳范围扫描
hbase(main):075:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:age, timestamp=1543665757228, value=22
row1 column=cf:height, timestamp=1543664308514, value=183
row1 column=cf:name, timestamp=1543664222231, value=wangziqiang
row2 column=cf:addr, timestamp=1543669367525, value=beijing
row2 column=cf:phone, timestamp=1543669351162, value=1348888888
row3 column=cf:id, timestamp=1543669389010, value=132
3 row(s)
Took 0.2155 seconds
#开始行和结束行扫描
hbase(main):076:0> scan 'test',{STARTROW=>'row1',ENDROW=>'row2'} #前包后不包
ROW COLUMN+CELL
row1 column=cf:age, timestamp=1543665757228, value=22
row1 column=cf:height, timestamp=1543664308514, value=183
row1 column=cf:name, timestamp=1543664222231, value=wangziqiang
1 row(s)
Took 0.0569 seconds
#开始时间戳范围扫描
hbase(main):085:0> scan 'test', {COLUMNS => 'cf', TIMERANGE => [1543665757228,1543669367525]}
ROW COLUMN+CELL
row1 column=cf:age, timestamp=1543665757228, value=22
row2 column=cf:phone, timestamp=1543669351162, value=1348888888
2 row(s)
Took 0.0287 secondsget获取值
- 以最简单的为例子:row1:cf:name
hbase(main):086:0> get 'test','row1','cf:name'
COLUMN CELL
cf:name timestamp=1543664222231, value=wangziqiang
1 row(s)
Took 0.0441 seconds- get也有过滤的功能,像之前的获取历史版本的条数就算是过滤,具体的可以看命令帮助
delete删除表数据
- 我们将历史记录删除一个
hbase(main):096:0> get 'test','row1',{COLUMN=>'cf:age',VERSIONS=>5}
COLUMN CELL
cf:age timestamp=1543665757228, value=22
cf:age timestamp=1543665590821, value=20
cf:age timestamp=1543665587740, value=15
cf:age timestamp=1543665582866, value=16
cf:age timestamp=1543664971060, value=18
1 row(s)
Took 0.0279 seconds
hbase(main):097:0> delete 'test','row1','cf:age',1543664971060
Took 0.0222 seconds
hbase(main):098:0> get 'test','row1',{COLUMN=>'cf:age',VERSIONS=>5}
COLUMN CELL
cf:age timestamp=1543665757228, value=22
cf:age timestamp=1543665590821, value=20
cf:age timestamp=1543665587740, value=15
cf:age timestamp=1543665582866, value=16
1 row(s)
Took 0.0252 seconds- 如果不指定删除的时间戳,那么是删除最新数据的,我们再次扫描,数据真真的被删除了,其实不然,他只是打上了一个删除标记,想查出被删除的数据还是有办法的
hbase(main):105:0> scan 'test',{RAW=>TRUE,VERSIONS=>5}
ROW COLUMN+CELL
row1 column=cf:age, timestamp=1543665757228, type=Delete
row1 column=cf:age, timestamp=1543665757228, value=22
row1 column=cf:age, timestamp=1543665590821, value=20
row1 column=cf:age, timestamp=1543665587740, type=Delete
row1 column=cf:age, timestamp=1543665587740, value=15
row1 column=cf:age, timestamp=1543665582866, value=16
row1 column=cf:age, timestamp=1543665580576, type=Delete
row1 column=cf:age, timestamp=1543665580576, value=17
row1 column=cf:age, timestamp=1543664971060, type=Delete
row1 column=cf:height, timestamp=1543664308514, value=183
row1 column=cf:name, timestamp=1543664222231, value=wangziqiang
row2 column=cf:addr, timestamp=1543669367525, value=beijing
row2 column=cf:phone, timestamp=1543669351162, value=1348888888
row3 column=cf:id, timestamp=1543669389010, value=132
3 row(s)
Took 0.0914 seconds- 被删除的数据都会有一个
type=Delete标记 - delete我们发现只是可以删除指定的列,那么如果痛快点删除一行数据呢?那就使用deleteall
hbase(main):106:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:age, timestamp=1543665590821, value=20
row1 column=cf:height, timestamp=1543664308514, value=183
row1 column=cf:name, timestamp=1543664222231, value=wangziqiang
row2 column=cf:addr, timestamp=1543669367525, value=beijing
row2 column=cf:phone, timestamp=1543669351162, value=1348888888
row3 column=cf:id, timestamp=1543669389010, value=132
3 row(s)
Took 0.0820 seconds
hbase(main):107:0> deleteall 'test','row2'
Took 0.0622 seconds
hbase(main):108:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:age, timestamp=1543665590821, value=20
row1 column=cf:height, timestamp=1543664308514, value=183
row1 column=cf:name, timestamp=1543664222231, value=wangziqiang
row3 column=cf:id, timestamp=1543669389010, value=132
2 row(s)
Took 0.0210 secondsdrop删除表
- HBase跟RDBMS不一样的是,RDBMS直接删除就可以了,只要你不存在主外键,但是HBase表是有启用和禁用状态的,创建成功默认是启用的,当我们在启用状态删除会报错,所以再删除之前,我们需要禁用表之后再删除,有时候HBase已经上线了,并且有很多人连接到了这个表,这时候禁用表会有些慢,因为他要通知所有使用这个表的RegionServer来禁用这个表
#删除之前先禁用
hbase(main):109:0> disable 'test'
Took 1.3984 seconds
hbase(main):110:0> scan 'test'
ROW COLUMN+CELL
org.apache.hadoop.hbase.TableNotEnabledException: test is disabled.
at org.apache.hadoop.hbase.client.ConnectionImplementation.relocateRegion(ConnectionImplementation.java:736)
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:328)
at org.apache.hadoop.hbase.client.ScannerCallable.prepare(ScannerCallable.java:139)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.prepare(ScannerCallableWithReplicas.java:399)
at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105)
at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
ERROR: Table test is disabled!
For usage try 'help "scan"'
Took 0.1362 seconds
hbase(main):111:0> drop 'test'
Took 0.7804 seconds
hbase(main):112:0> list
TABLE
0 row(s)
Took 0.0046 seconds
=> []- 如上在禁用后,我们可以使用获取数据的命令来检查是否表已经被禁用了,禁用后就可以用drop直接删除了
- 检查是否被禁用也可以使用
is_disable
hbase(main):114:0> create 'test','cf'
Created table test
Took 1.3164 seconds
=> Hbase::Table - test
hbase(main):115:0> is_disabled 'test'
false
Took 0.0110 seconds
=> 1
hbase(main):116:0> disable 'test'
Took 0.7647 seconds
hbase(main):117:0> is_disabled 'test'
true
Took 0.0394 seconds
=> 1
hbase(main):118:0> drop 'test'
Took 0.4623 seconds
hbase(main):119:0> list
TABLE
0 row(s)
Took 0.0052 seconds
=> []- 好了,命令就介绍了这么几个,然后HBase中还有很多很多,以后用到了再详细说吧哈哈
















