python HBASE 去掉b python读取hbase

转载

烂漫树林 2024-05-14 12:06:05

文章标签 python HBASE 去掉b hbase python thrift 大数据 文章分类 Python 后端开发

1、hbase环境搭建

参考链接
按照这样搭建基本没问题。

“ create ‘表名’，‘列族名’ ”指令，如下图：

create 'test_table','info'

python HBASE 去掉b python读取hbase_python HBASE 去掉b

我们再来看一下我们刚刚建的表，输入“list”指令，如下图：

python HBASE 去掉b python读取hbase_python HBASE 去掉b_02

可以看到我们刚刚建的表“test_table”

向表中插入数据

put 'test_table','row-1','info:name','zhangshan'
put 'test_table','row-1','info:age','25'
put 'test_table','row-2','info:name','lisi'
put 'test_table','row-2','info:age','12'

其中row-1是rowkey，列簇后面的是列，最后一个字段是value值

用scan查询表的信息scan ‘test_table’

也可以针对某个列簇信息进行查询。

scan 'test_table',{COLUMNS =>'info:name'}

python HBASE 去掉b python读取hbase_python_03

删除某一行的某个列簇

delete 'test_table','row-1','base_info:name'

python HBASE 去掉b python读取hbase_python_04

2.python操作hbase远程连接

首先先要安装thrift
pip install thrift
然后启动thrift，记住要进入hbase的bin目录下

hbase thrift -p 9090 start

python HBASE 去掉b python读取hbase_大数据_05

有同学使用./hbase-daemon.sh start thrift命令会报错，因为我们是Windows环境。可以使用jps命令查询thrift有没有启动成功。

python HBASE 去掉b python读取hbase_hbase_06

然后安装happybase工具类，就可以利用python操作hbase了。

pip install happybase

实例：越简单越好，先测试能不能连上

# _*_ coding:utf-8 _*_
import happybase
connection = happybase.Connection('localhost',port=9090,autoconnect=False)
connection.open()
print(connection.tables())#查看hbase现有的所有表名
connection.close()

python HBASE 去掉b python读取hbase_thrift_07

happybaseAPI 参考链接https://happybase.readthedocs.io/en/latest/user.html#establishing-a-connection

3、python操作hbase实例，读取hbase数据库的数据并整合到一个文件中

1、利用python生成100行数据到hbase

首先在hbase中创建一个单词表
create ‘mytable’,‘cf’

# _*_ coding:utf-8 _*_
import happybase
import random
def getWords():#随机生成单词数据
    words = ['hello', 'home', 'back', 'state', 'ratio', 'code', 'phone', 
    'and', 'traffic', 'approach']
    res = ''
    for _ in range(10):
        res += words[random.randint(0, 9)] + ' '
    return res
connection = happybase.Connection('localhost', port=9090,autoconnect=False)#连接到hbase
connection.open()
print(connection.tables())#查看hbase现有的所有表名
table = connection.table('mytable')
for i in range(1,101):#插入100行数据到hbase
	r='row-key'+str(i)
	table.put(r.encode(encoding='utf-8'), {b'cf:word': getWords().encode(encoding='utf-8')})
connection.close()

scan 'mytable’看是否插入成功

python HBASE 去掉b python读取hbase_thrift_08

2、利用python读取hbase单词数据，并整合到一个文件中，之后就可以用hadoop单词统计了

# _*_ coding:utf-8 _*_
import happybase
connection = happybase.Connection('localhost', port=9090,autoconnect=False)
connection.open()
print(connection.tables())#查看hbase现有的所有表名
table = connection.table('mytable')
fo = open("demoput.txt", "w")#打开一个文件，没有则新建一个
# info={}
for key, data in table.scan():#读取key，value的字典。行键和列簇的键值对
	for j in data.values():#data里面是列簇和单词的字典，data.values()是单词信息
		fo.write(j.decode(encoding='utf-8'))
		# print(j) 
		# print(data.values())
connection.close()
# print(type(data))

执行然后查看demoput.txt

python HBASE 去掉b python读取hbase_thrift_09

好了结束，大家有什么疑问可以给我留言。写的第一篇博客。以后还是得多输出。

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：bgp默认路由的一下跳怎么确定的 bgp默认路由怎么下发

下一篇：idea kotlin转java code没有转换 java转kotlin要多久

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯