pandas dataframe写入hdfs csv文件的两种方式:

1、

from hdfs.client import Client
cleint.write(hdfs_url, df.to_csv(idnex=False), overwrite=True, encoding='utf-8')

2、

with client.write(hdfs_url, overwrite=True) as writer:
  df.to_csv(writer, encoding='utf-8', index=False)

推荐使用方法二,写入效率要比方法一高得多。

 

从hdfs读文本数据

from hdfs.client import Client
client = Client("http://localhost:50070")
filepath="test.txt"
with client.read(filepath) as fs:
  content = fs.read()
  print(content)

 

从hdfs读excel

with client.read(filepath) as fs:
  content = fs.read()
  table = pd.read_excel(content)