HBase2.2.4是截止2020-08-03官方最新的Stable版本,因为做了snappy压缩,但是还未升级到生产环境,需要在测试环境先测试压缩效果的同时也需要测试读写性能


YCSB-0.17.0压测HBase,这个工具不多介绍了,雅虎开源的专门压测NoSQL数据库的

ycsb-0.17.0下载、解压(这个安装包670M,包含了这个工具支持的所有NoSQL数据库测试样例,若不需要其他的,可以自行下载源码编译指定的数据库,这样体积比较小)

hadoop@hadoop1:~$ wget https://github.com/brianfrankcooper/YCSB/releases/download/0.17.0/ycsb-0.17.0.tar.gz
hadoop@hadoop1:~$ tar zxvf ycsb-0.17.0.tar.gz && cd  ycsb-0.17.0

目录如下

hadoop@hadoop1:~/ycsb-0.17.0$ ll
total 232
drwxrwxr-x 54 hadoop hadoop  4096 Aug  3 05:49 ./
drwxr-xr-x 23 hadoop hadoop  4096 Aug  3 05:48 ../
drwxrwxr-x  4 hadoop hadoop  4096 Aug  3 05:48 accumulo1.6-binding/
drwxrwxr-x  4 hadoop hadoop  4096 Aug  3 05:48 accumulo1.7-binding/
drwxrwxr-x  4 hadoop hadoop  4096 Aug  3 05:48 accumulo1.8-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:48 aerospike-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:48 arangodb-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:48 asynchbase-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:48 azurecosmos-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:48 azuretablestorage-binding/
drwxrwxr-x  2 hadoop hadoop  4096 Aug  3 05:48 bin/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:48 cassandra-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:48 cloudspanner-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:48 couchbase2-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:48 couchbase-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:48 crail-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:48 dynamodb-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:48 elasticsearch5-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:48 elasticsearch-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:48 foundationdb-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:48 geode-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:48 googlebigtable-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:48 googledatastore-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:48 griddb-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:48 hbase098-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:48 hbase10-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:48 hbase12-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:49 hbase14-binding/
drwxrwxr-x  4 hadoop hadoop  4096 Aug  3 05:50 hbase20-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:49 hypertable-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:49 ignite-binding/
drwxrwxr-x  4 hadoop hadoop  4096 Aug  3 05:49 infinispan-binding/
drwxrwxr-x  4 hadoop hadoop  4096 Aug  3 05:49 jdbc-binding/
drwxrwxr-x  4 hadoop hadoop  4096 Aug  3 05:49 kudu-binding/
drwxrwxr-x  2 hadoop hadoop  4096 Aug  3 05:49 lib/
-rw-r--r--  1 hadoop hadoop 11358 May 19  2018 LICENSE.txt
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:49 maprdb-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:49 maprjsondb-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:49 memcached-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:49 mongodb-binding/
drwxrwxr-x  4 hadoop hadoop  4096 Aug  3 05:49 nosqldb-binding/
-rw-r--r--  1 hadoop hadoop   615 May 19  2018 NOTICE.txt
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:49 orientdb-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:49 postgrenosql-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:49 rados-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:49 redis-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:49 rest-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:49 riak-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:49 rocksdb-binding/
drwxrwxr-x  4 hadoop hadoop  4096 Aug  3 05:49 s3-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:49 solr6-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:49 solr-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:49 tablestore-binding/
drwxrwxr-x  4 hadoop hadoop  4096 Aug  3 05:49 tarantool-binding/
drwxrwxr-x  3 hadoop hadoop  4096 Aug  3 05:49 voltdb-binding/
drwxr-xr-x  2 hadoop hadoop  4096 Sep 25  2019 workloads/
hadoop@hadoop1:~/ycsb-0.17.0$

上述输出标记的即为HBase压测执行所需配置的目录,但目录内部默认没有我集群自己的配置,需要我将$HBASE_HOME/conf/hbase-site.xml拷贝到此

hadoop@hadoop1:~/ycsb-0.17.0/hbase20-binding$ tree
.
├── conf(需要手动新建)
│   └── hbase-site.xml(拷贝线上的配置)
├── lib
│   ├── audience-annotations-0.5.0.jar
│   ├── commons-logging-1.1.3.jar
│   ├── findbugs-annotations-1.3.9-1.jar
│   ├── hbase10-binding-0.17.0.jar
│   ├── hbase20-binding-0.17.0.jar
│   ├── hbase-shaded-client-2.0.0.jar
│   ├── htrace-core4-4.1.0-incubating.jar
│   ├── log4j-1.2.17.jar
│   └── slf4j-api-1.7.25.jar
└── README.md

2 directories, 11 files
hadoop@hadoop1:~/ycsb-0.17.0/hbase20-binding$

准备工作到此就做好了,先看一下命令用法

hadoop@hadoop1:~/ycsb-0.17.0$ ./bin/ycsb
usage: ./bin/ycsb command database [options]

Commands:
    load           Execute the load phase
    run            Execute the transaction phase
    shell          Interactive mode

Databases:
    ...
    hbase098       https://github.com/brianfrankcooper/YCSB/tree/master/hbase098
    hbase10        https://github.com/brianfrankcooper/YCSB/tree/master/hbase10
    hbase12        https://github.com/brianfrankcooper/YCSB/tree/master/hbase12
    hbase14        https://github.com/brianfrankcooper/YCSB/tree/master/hbase14
    hbase20        https://github.com/brianfrankcooper/YCSB/tree/master/hbase20
    ...

Options:
    -P file        Specify workload file
    -cp path       Additional Java classpath entries
    -jvm-args args Additional arguments to the JVM
    -p key=value   Override workload property
    -s             Print status to stderr
    -target n      Target ops/sec (default: unthrottled)
    -threads n     Number of client threads (default: 1)

Workload Files:
    There are various predefined workloads under workloads/ directory.
    See https://github.com/brianfrankcooper/YCSB/wiki/Core-Properties
    for the list of workload properties.
ycsb: error: too few arguments
hadoop@hadoop1:~/ycsb-0.17.0$

同时,还有选择合适的workload数据,在目录里面有以下几种:

hadoop@hadoop1:~/ycsb-0.17.0$ tree workloads/
workloads/
├── tsworkloada
├── tsworkload_template
├── workloada  //重更新,50% 读 50% 写
├── workloadb  //读多写少,95% 读 5% 写
├── workloadc  //只读:100% 读
├── workloadd  //读最近更新:这个 workload 会插入新纪录,越新的纪录读取概率越大
├── workloade  //小范围查询:这个 workload 会查询小范围的纪录,而不是单个纪录
├── workloadf  //读取-修改-写入:这个 workload 会读取一个纪录,然后修改这个纪录,最后写回
└── workload_template  //压测配置(文末讲)

0 directories, 9 files
hadoop@hadoop-backup-01:~/ycsb-0.17.0$

命令参数的含义

hbase20:  使用 HBase 2.x 版本的数据库连接层
-P:  指定 workload 配置文件路径,使用 Workload A 类型
-p:  指定单个配置(会覆盖之前文件中的配置)
table=,columnfamily=:  指定 HBase 表名和列族
recordcount=,operationcount=:  指定纪录数和操作数
-thread:   指定客户端线程数
-s:  打印状态

加载数据到指定的表内

hadoop@hadoop1:~/ycsb-0.17.0$ bin/ycsb load hbase20 -P workloads/workloada -p threads=10 -p table=nosnappy -p columnfamily=data -p recordcount=100000 -s

ec hadoop 压测 hbase压测_压测

ec hadoop 压测 hbase压测_压测_02

...
Loading workload...
log4j:WARN No appenders could be found for logger (org.apache.htrace.core.Tracer).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Starting test.
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
2020-08-03 06:53:01:837 0 sec: 0 operations; est completion in 0 second 
DBWrapper: report latency for each error is false and specific error codes to track for latency are: []
2020-08-03 06:53:11:817 10 sec: 6648 operations; 664.8 current ops/sec; est completion in 2 minutes [INSERT: Count=6648, Max=69503, Min=805, Avg=1428.28, 90=2289, 99=3209, 99.9=6047, 99.99=14895]
...

(执行结果如下)
[OVERALL], RunTime(ms), 134495    加载数据的耗时
[OVERALL], Throughput(ops/sec), 743.5220640172497  吞吐量,数据条数取整即可
[TOTAL_GCS_PS_Scavenge], Count, 23
[TOTAL_GC_TIME_PS_Scavenge], Time(ms), 34
[TOTAL_GC_TIME_%_PS_Scavenge], Time(%), 0.02527975017658649
[TOTAL_GCS_PS_MarkSweep], Count, 1
[TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 19
[TOTAL_GC_TIME_%_PS_MarkSweep], Time(%), 0.014126919216327744
[TOTAL_GCs], Count, 24
[TOTAL_GC_TIME], Time(ms), 53
[TOTAL_GC_TIME_%], Time(%), 0.039406669392914234
[CLEANUP], Operations, 2
[CLEANUP], AverageLatency(us), 781.5
[CLEANUP], MinLatency(us), 5
[CLEANUP], MaxLatency(us), 1558
[CLEANUP], 95thPercentileLatency(us), 1558
[CLEANUP], 99thPercentileLatency(us), 1558
[INSERT], Operations, 100000    执行insert操作总数
[INSERT], AverageLatency(us), 1336.92224    每次insert操作的平均耗时
[INSERT], MinLatency(us), 514    所有insert操作的最小耗时
[INSERT], MaxLatency(us), 672255  所有insert操作的最大耗时
[INSERT], 95thPercentileLatency(us), 2455    95%的insert操作耗时在2455us内
[INSERT], 99thPercentileLatency(us), 4399    99%的insert操作耗时在4399us内
[INSERT], Return=OK, 100000    返回成功条数
hadoop@hadoop1:~/ycsb-0.17.0$

加载数据详情

执行压测是,命令仅仅需要把上面的load换成run即可,若需要记录压测结果后面跟上追加重定向到一个文件就可以了,默认是打在标准输出上的

上面是基本的压测执行,压测的配置文件上面提到了,是workload/workload_templete,里面的配置有时间单位的换算,也有请求的概率分布类型(说人话就是你压测的数据是想要均匀分布,还是齐夫定律的数据,还是最新的数据访问概率越高。这几种),这个文件内容我就不贴出来了,注释写的很详细,虽然是英文,努努力也都能看懂,下面是我三台比较屌丝的戴尔服务器压测效果,仅供参考

ec hadoop 压测 hbase压测_hadoop_03

 

测试结果跟实际环境的硬件水平关系很大,内存频率,cpu核心数等等,所以别人的数据都只能是"仅供参考",还是要亲自压测才能有结论

 ——————