读取docx 各级标题序号 python python获取word标题号

转载

mob64ca140761a4 2024-08-27 09:56:50

文章标签 词云虚拟环境 ci 文章分类 Python 后端开发

Python第三方库

pyinstaller

pyinstaller的参数
使用实例

打包成一个独立exe程序
打包成一个小exe程序

使用 NSIS 打包为安装程序

将 zip 打包为安装程序

pyinstaller 打包注意事项

打包 pyecharts

jieba

分词模式
常用函数

wordcloud

简述
主要方法
实例 —— 分析《三体》

使用jieba对文本进行语义拆分
去除某些文字
设置最大词数
设置词云的图形

pipenv

pipenv管理结构
pipenv使用

查看帮助 —— pipenv -h
查看当前目录虚拟环境 —— pipenv --venv
创建虚拟环境 —— pipenv --three / pipenv --python 3.6
进入pipenv的命令行 —— pipenv shell
安装第三方库 —— pipenv install 库
查看库依赖情况 —— pipenv graph
虚拟环境中调用脚本 —— pipenv run python pyScript.py

tabulate

tabulate.tabulate(data, para)

QRcode

pyinstaller

cmd调用命令：pyinstaller -F 文件.py

pyinstaller的参数

参数	描述
-h / --help	显示帮助信息
-v / --version	显示版本信息
–distpath DIR	指定输出的目录，默认是./dis
–workpath WDIR	指定存储临时文件的目录，默认是./build
-y / --noconfirm	强制替换输出目录的内容，没有确认询问
-D / --onedir	创建一个目录，包含可执行程序及其依赖文件
-F / --onefile	创建一个独立的可执行文件
-n NAME / --name NAME	指定输出文件的名称
-i 图标文件.ico	指定打包程序使用的图标文件
-w / --windowed, --noconsole	python文件运行时，不打开控制台

pyinstaller默认使用-D，会将程序打包为"执行程序" + "依赖文件"的形式，使用时要把整个文件都复制到目标计算机上才能运行。或者使用-F，只生成一个大的可执行文件

使用实例

被打包的程序只有三行内容，一个简单的求平方程序，下面所有例子都会使用此代码

读取docx 各级标题序号 python python获取word标题号_虚拟环境

打包成一个独立exe程序

cmd 中命令为

pyinstaller --onefile square.py

生成结果如下，可执行程序在 dist 目录中。此处的 build 目录可以删除，不影响使用。

读取docx 各级标题序号 python python获取word标题号_虚拟环境_02

dist 中的 exe 程序可以直接复制到任意地方，并且双击直接执行

读取docx 各级标题序号 python python获取word标题号_虚拟环境_03

读取docx 各级标题序号 python python获取word标题号_词云_04

注：该文件有 7.43 MB 大

读取docx 各级标题序号 python python获取word标题号_ci_05

打包成一个小exe程序

打包成小 exe 程序，则代表其依赖并不会一同打包到 exe 中，而是独立储存

读取docx 各级标题序号 python python获取word标题号_虚拟环境_06

注：该 exe 文件仅有 1.96 MB 大

使用时需要将 exe 程序及其依赖一同复制到新机器上，并且相对位置不可改变。若想将该 exe 及其依赖打包为一个安装程序，则可借助下方的 NSIS 程序进行。

使用 NSIS 打包为安装程序

NSIS下载地址

读取docx 各级标题序号 python python获取word标题号_词云_07

主程序界面如下

读取docx 各级标题序号 python python获取word标题号_虚拟环境_08

将 zip 打包为安装程序

1、将 exe 及其依赖所在的目录打包为 zip 文件

读取docx 各级标题序号 python python获取word标题号_虚拟环境_09

2、使用 NSIS 进行打包

读取docx 各级标题序号 python python获取word标题号_ci_10

选择文件冰调整设置

读取docx 各级标题序号 python python获取word标题号_词云_11

等待打包完成

读取docx 各级标题序号 python python获取word标题号_词云_12

读取docx 各级标题序号 python python获取word标题号_虚拟环境_13

双击新生成 exe 安装程序，进行安装

读取docx 各级标题序号 python python获取word标题号_词云_14

读取docx 各级标题序号 python python获取word标题号_ci_15

读取docx 各级标题序号 python python获取word标题号_词云_16

安装成功，双击执行

读取docx 各级标题序号 python python获取word标题号_虚拟环境_17

读取docx 各级标题序号 python python获取word标题号_虚拟环境_18

pyinstaller 打包注意事项

打包 pyecharts

直接使用常用的打包方式，打包出来的exe会无法运行，提示缺少 xxx.json 或者 xxx.html。这是因为 pyecharts 在创建 html 的时候需要 xxx.json 中的预设数据和 xxx.html 的模板

解决方式为将数据和模板一起打包进去，代码正常写，打包命令如下：

pyinstaller -F --distpath ./release --workpath ./release/build -i img/hispatial.ico --add-data="D:/codes/venv/hs_sys_monitor_charts/Lib/site-packages/pyecharts/datasets;pyecharts/datasets" --add-data="D:/codes/venv/hs_sys_monitor_charts/Lib/site-packages/pyecharts/render/templates;pyecharts/render/templates" -n hs_system_monitor_charts monitor_charts_cli.py

jieba

jieba库提供中文分词的功能

分词模式

jieba库有三种分词模式：
1、精确模式：将文本精确切分开，分词后无冗余，适合文本分词

2、全模式：把文本中所有可能成词的词语都扫描出来，速度快，不能解决歧义

3、搜索引擎模式：在精确模式基础上，对长词再次切分，提高召回率

常用函数

函数	说明	返回值
jieba.cut(s)	精确模式分词	迭代器
jieba.lcut(s)	精确模式分词	列表
jieba.cut(s, cut_all=True)	全模式分词	迭代器
jieba.lcut(s, cut_all=True)	全模式分词	列表
jieba.cut_for_search(s)	搜索引擎模式分词	迭代器
jieba.lcut_for_search(s)	搜索引擎模式分词	列表
jieba.add_word(w)	向分词的词典添加新词	如：jieba.add_word(“蟒蛇语言”)

例如：
1、精确模式分词

import jieba

res = jieba.cut("西北农林科技大学")
print(res)
print(type(res))

读取docx 各级标题序号 python python获取word标题号_虚拟环境_19

import jieba

res = jieba.lcut("人民英雄永垂不朽")
print(res)
print(type(res))

读取docx 各级标题序号 python python获取word标题号_词云_20

2、全模式分词

import jieba

res = jieba.lcut("人民英雄永垂不朽", cut_all=True)
print(res)
print(type(res))

读取docx 各级标题序号 python python获取word标题号_虚拟环境_21

3、搜索引擎模式

import jieba

res = jieba.lcut_for_search("人民英雄永垂不朽")
print(res)
print(type(res))

读取docx 各级标题序号 python python获取word标题号_ci_22

4、添加新词

添加新词前

import jieba

res = jieba.lcut_for_search("python也被称作蟒蛇语言")
print(res)
print(type(res))

读取docx 各级标题序号 python python获取word标题号_词云_23

添加新词后

import jieba

jieba.add_word("蟒蛇语言")
res = jieba.lcut_for_search("python也被称作蟒蛇语言")
print(res)
print(type(res))

读取docx 各级标题序号 python python获取word标题号_词云_24

wordcloud

简述

一个面向对象的库，实例化词云对象后调用相关属性及方法进行操作。

词云的基本特点：
1、词语中间以空格分隔
2、字体根据词语自行推算
3、默认输出图片大小为 400*200

存在的问题：中文不以空格作为词语的分隔，所以直接将中文字符串输入无法得到有效的输出。此时需要借助jieba库对中文语句进行语义的拆分，再将结果输入词云当中。

主要方法

方法	描述
wordcloud.WordCloud([para])	实例化词云对象
w.generate(txt)	向词云对象中添加文本
w.to_file(fileName)	将词云输出为图片（png / jpg）

w — wordcloudObject

para:

width / height —— 控制输出图片的宽、高（ wordcloud.WordCloud(width=800, height=600) ）
min_font_size / max_font_size —— 控制词云中字体的最小（默认为4）、最大字号（根据高度自动调节）（ wordcloud.WordCloud(min_font_size=8, max_font_size=10) ）
font_step —— 词云中字体、字号的步进间距，默认1
font_path —— 字体的路径，默认None。如果要显示中文，需要添加此项，如 font_path="msyh.ttc"
max_words —— 词云中显示的最大单词数，默认200
stop_words —— 不计入统计的词语集合（ wordcloud.WordCloud(stop_words={“qwer”}) ）
mask —— 词云的形状，需要引用imread()函数，默认为长方形。

from scipy.misc import imread
# 背景为白色
mk = imread(pic.png)
w = wordcloud.WordCloud(mask=mk)

background_color —— 词云背景颜色，默认为黑色（ …(background_color=“white”) ）

如：

import wordcloud

# 实例化对象
w = wordcloud.WordCloud()

# 添加文本信息，生成词云
w.generate("CHONGQING, Aug. 19 (Xinhua) -- Leading car manufacturers, research institutes, college teams, and individual participants are taking part in a self-driving vehicle challenge game in China, which started Thursday in southwest China's Chongqing Municipality. The four-day game, officially named the i-VISTA Autonomous Vehicle Grand Challenge, consists of five rounds of competitions on environment identification, decision analysis, and the control and execution capability of the vehicles. The highlight of the challenge will be the Advanced Driving Assistance System (ADAS) race, which invites consumers to bring their own vehicles to challenge each other in terms of the vehicles' automatic emergency braking system and automatic parking system. According to the organizer, nearly 30 vehicle models covering almost all the major intelligent automobiles on the market will likely be involved. As one of the major activities of the Smart China Expo, the challenge has been held three times previously, attracting more than 300 teams. The rules are adjusted this year so that consumers can participate with their own cars, said the organizer. The 2021 Smart China Expo, set to be held from Aug. 23 to 25 in Chongqing, aims to promote exchanges in smart technologies and international cooperation in the smart industry. ")

# 输出图片
w.to_file("cw.jpg")

读取docx 各级标题序号 python python获取word标题号_ci_25

实例 —— 分析《三体》

import wordcloud

# 读入三体
with open("D:/三体.txt", encoding="utf-8") as f:
    txt = f.read()

# 实例化对象
w = wordcloud.WordCloud(font_path="msyh.ttc")

# 添加文本信息，生成词云
w.generate(txt)

# 输出图片
w.to_file("D:/cw.jpg")

读取docx 各级标题序号 python python获取word标题号_ci_26

使用jieba对文本进行语义拆分

import wordcloud, jieba

# 读入三体
with open("D:/三体.txt", encoding="utf-8") as f:
    txt = f.read()

# jieba进行拆词
txt = " ".join(jieba.lcut(txt))

# 实例化对象
w = wordcloud.WordCloud(font_path="msyh.ttc")

# 添加文本信息，生成词云
w.generate(txt)

# 输出图片
w.to_file("D:/cw.jpg")

读取docx 各级标题序号 python python获取word标题号_词云_27

去除某些文字

import wordcloud, jieba

# 读入三体
with open("D:/三体.txt", encoding="utf-8") as f:
    txt = f.read()

# jieba进行拆词
txt = " ".join(jieba.lcut(txt))

# 设置要去除的文字
wdSet = {"这时", "现在", "当然", "是的", "所以", "不过", "我们", "他们", "一个", "自己", "现在", "没有", "可能", "知道", "如果", "然后", "只是", "看到", "这个", "什么", "不是", "就是", "已经", "可以", "这样", "这里", "那个", "出现", "你们", "还是", "一样", "这种", "很快", "一切", "东西", "一直", "两个", "同时"}

# 实例化对象
w = wordcloud.WordCloud(font_path="msyh.ttc", stopwords=wdSet)

# 添加文本信息，生成词云
w.generate(txt)

# 输出图片
w.to_file("D:/cw.jpg")

读取docx 各级标题序号 python python获取word标题号_虚拟环境_28

设置最大词数

import wordcloud, jieba

# 读入三体
with open("D:/三体.txt", encoding="utf-8") as f:
    txt = f.read()

# jieba进行拆词
txt = " ".join(jieba.lcut(txt))

# 设置要去除的文字
wdSet = {"这时", "现在", "当然", "是的", "所以", "不过", "我们", "他们", "一个", "自己", "现在", "没有", "可能", "知道", "如果", "然后", "只是", "看到", "这个", "什么", "不是", "就是", "已经", "可以", "这样", "这里", "那个", "出现", "你们", "还是", "一样", "这种", "很快", "一切", "东西", "一直", "两个", "同时"}

# 实例化对象
w = wordcloud.WordCloud(font_path="msyh.ttc", stopwords=wdSet, max_words=20)

# 添加文本信息，生成词云
w.generate(txt)

# 输出图片
w.to_file("D:/cw.jpg")

读取docx 各级标题序号 python python获取word标题号_虚拟环境_29

设置词云的图形

注意：在scipy的1.4.1中没有 scipy.misc.imread()方法，要安装1.2.1的版本。
原因：在scipy1.3.0之后便舍弃了imread()方法，所以之后的版本都无法通过此方法去读取图形。可以使用替代库imageio进行。

import imageio
img = imageio.imread(myImage)

我的图形是这样的

读取docx 各级标题序号 python python获取word标题号_ci_30

import wordcloud, jieba
import scipy.misc


# 读入三体
with open("D:/三体.txt", encoding="utf-8") as f:
    txt = f.read()

# jieba进行拆词
txt = " ".join(jieba.lcut(txt))

# 设置要去除的文字
wdSet = {"这时", "现在", "当然", "是的", "所以", "不过", "我们", "他们", "一个", "自己", "现在", "没有", "可能", "知道", "如果", "然后", "只是", "看到", "这个", "什么", "不是", "就是", "已经", "可以", "这样", "这里", "那个", "出现", "你们", "还是", "一样", "这种", "很快", "一切", "东西", "一直", "两个", "同时"}

# 读取形状图形
mask = scipy.misc.imread("D:/pic.png")

# 实例化对象
w = wordcloud.WordCloud(font_path="msyh.ttc", stopwords=wdSet, max_words=20, mask=mask)

# 添加文本信息，生成词云
w.generate(txt)

# 输出图片
w.to_file("D:/cw.jpg")

读取docx 各级标题序号 python python获取word标题号_虚拟环境_31

解除最大词数限制后，图形如下

读取docx 各级标题序号 python python获取word标题号_虚拟环境_32

pipenv

提供隔离的开发环境，基于virtualenv和pip

pipenv管理结构

通过创建 Pipfile 和 Pipfile.lock 来对虚拟环境进行管理

Pip —— 记录虚拟环境信息，包括各类库，仅记录库不记录版本
Pipfile.lock —— 锁定python版本，还记录库的版本

pipenv使用

查看帮助 —— pipenv -h

cmd —> pipenv -h 显示帮助信息

读取docx 各级标题序号 python python获取word标题号_虚拟环境_33

读取docx 各级标题序号 python python获取word标题号_ci_34

读取docx 各级标题序号 python python获取word标题号_词云_35

查看当前目录虚拟环境 —— pipenv --venv

查看当前目录下存在的虚拟环境

读取docx 各级标题序号 python python获取word标题号_虚拟环境_36

创建虚拟环境 —— pipenv --three / pipenv --python 3.6

读取docx 各级标题序号 python python获取word标题号_虚拟环境_37

读取docx 各级标题序号 python python获取word标题号_虚拟环境_38

进入pipenv的命令行 —— pipenv shell

读取docx 各级标题序号 python python获取word标题号_ci_39

安装第三方库 —— pipenv install 库

读取docx 各级标题序号 python python获取word标题号_虚拟环境_40

读取docx 各级标题序号 python python获取word标题号_ci_41

查看库依赖情况 —— pipenv graph

读取docx 各级标题序号 python python获取word标题号_词云_42

虚拟环境中调用脚本 —— pipenv run python pyScript.py

只有在虚拟环境的shell中执行代码，才是当前的虚拟环境。退出则按照环境变量自行寻找python解释器

tabulate

格式化打印表格数据。支持二维列表、二维迭代类型、字典迭代类型、numpy二维数组、pandas.DataFrame等类型。且输出风格可以自定义。

如：

import tabulate, pprint

list1 = [["北京", 1], ["上海", 2], ["广州", 3], ["深圳", 4]]

# 普通打印
print(list1)

# pprint打印
pprint.pprint(list1)

# tabulate打印
print(tabulate.tabulate(list1))

读取docx 各级标题序号 python python获取word标题号_ci_43

tabulate.tabulate(data, para)

para:

headers —— 定义表头
tablefmt —— 定义表格风格
numalign —— 数字对齐方式

headers:

import tabulate

list1 = [["北京", 1], ["上海", 2], ["广州", 3], ["深圳", 4]]

# tabulate打印
print(tabulate.tabulate(list1, headers=["城市", "排名"]))

读取docx 各级标题序号 python python获取word标题号_ci_44

tablefmt支持的风格

风格
“plain”
“simple”
“grid”
“fancy_grid”
“pipe”
“orgtbl”
“jira”
“presto”
“psql”
“rst”
“rst”
“mediawiki”
“moinmoin”
“youtrack”
“html”
“latex”
“latex_raw”
“latex_booktabs”
“texttile”

QRcode

安装 pip install QRcode
使用 import qrcode

支持各种类型数据的二维码生成

使用方法：

img = qrcode.make(txt, border)

txt —— 将被转换为二维码的字符换或者字节串等
border —— 二维码图形到最外侧的距离，即留白的宽度
img —— 返回值，PIL库的image对象

实例：

import qrcode

# 二维码连接
url = "https://www.baidu.com"

# 生成二维码
img = qrcode.make(url)

# 图片保存
img.save("D:/qrImg.png")

修改border

import qrcode

# 二维码连接
url = "https://www.baidu.com"

# 生成二维码
img = qrcode.make(url, border=1)

# 图片保存
img.save("D:/qrImg.png")

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：栈数据合并java 栈和堆java

下一篇：nginx api 后端接口 nginx配置接口

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯