Python通过docx模块读写微软docx文件

原创

彭世瑜 2021-07-12 10:48:48 ©著作权

©著作权归作者所有：来自51CTO博客作者彭世瑜的原创作品，请联系作者获取转载授权，否则将追究法律责任

读取docx文件方式一：

读取流程：二进制对象 - 》解压 ——》读取xml文件

# -*- encoding: utf-8 -*-

from zipfile import ZipFile
from urllib import urlopen
from io import BytesIO
from bs4 import BeautifulSoup

# url="http://www.pythonscraping.com/pages/AwordDocument.doxc"
# word_file = urlopen(url).read()
# word_file = BytesIO(worl_file)

word_file = open("AWordDocument.docx", "rb")
document = ZipFile(word_file)
xml_content = document.read("word/document.xml")
text = xml_content.decode("utf-8")
print text

方式二

解析出文本内容

pip install python-docx

import docx

doc = docx.Document("AWordDocument.docx")
print doc

# 打印所有段落内容
for p in  doc.paragraphs:
    print p.text

更多内容参考
官方文档：http://python-docx.readthedocs.io/en/latest/index.html

上一篇：Python爬虫selenium模块

下一篇：SpringBoot学习笔记-7：第七章 Spring Boot 启动配置原理

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯