关于python程序使用pyinstaller打包exe, 执行过程中内存不断增长问题
  1. 我使用的python版本为3.7.9 打包使用 pyinstaller
  2. 最开始 使用的是 objgraph 模块,会显示变量的类型总数的增长数量的显示, 但是只显示变量 类型,不显示变量属性,不容易直接观看
# !/usr/bin/python
# -*- coding: UTF-8 -*-

if __name__ == '__main__':
    # 复制代码
    import objgraph as objgraph
    objgraph.show_growth()
    import MySQLdb
    # 打开数据库连接
    db = MySQLdb.connect("localhost", "root", "root", '', 3306)
    print(type(db))
    # 使用cursor()方法获取操作游标
    cursor = db.cursor()
    print(type(cursor))
    # 使用execute方法执行SQL语句
    cursor.execute("SELECT VERSION()")
    # 使用 fetchone() 方法获取一条数据库。
    data = cursor.fetchone()
    print(type(data))
    print("Database version : %s " % data)
    # 关闭数据库连接
    db.close()
    # del db
    # del cursor
    # del data
    import objgraph as objgraph
    objgraph.show_growth()

结果显示如下, 效果不是很明显

function                       7600     +7600
dict                           3999     +3999
tuple                          3219     +3219
wrapper_descriptor             2012     +2012
weakref                        1814     +1814
list                           1746     +1746
method_descriptor              1474     +1474
builtin_function_or_method     1394     +1394
getset_descriptor              1353     +1353
type                           1028     +1028
<class 'MySQLdb.connections.Connection'>
<class 'MySQLdb.cursors.Cursor'>
<class 'tuple'>
Database version : 8.0.25 
function               7673       +73
method_descriptor      1543       +69
dict                   4054       +55
set                     535       +37
tuple                  3255       +36
FuncCodeInfo            281       +33
wrapper_descriptor     2041       +29
weakref                1839       +25
type                   1050       +22
getset_descriptor      1370       +17
  1. 我找到了memory_profiler这个第三方库,它可以帮助我们分析记录Python脚本中,执行到每一行时,内存的消耗及波动变化情况。
    memory_profiler的使用方法超级简单,使用pip install memory_profiler完成安装后,只需要从memory_profiler导入profile并作为要分析的目标函数的装饰器即可
#!/usr/bin/python
# -*- coding:utf-8 -*-

from matplotlib import pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.neighbors import KNeighborsClassifier
from memory_profiler import profile


@profile
def iris_test():
    scores = []
    ks = []
    iris = datasets.load_iris()
    feature = iris['data']
    target = iris['target']
    # 拆分出训练集和测试集
    x_train, x_test, y_train, y_test = train_test_split(feature, target, test_size=0.2, random_state=2020)
    print(x_train.shape)
    print(x_test.shape)
    print(y_train.shape)
    print(y_test.shape)
    # 5.实例化模型对象
    # n_neighbors == k-在knn中k的取值不同会直接导致分类结果的不同,n_neighbors就是K值。
    # ·模型的超参数:如果模型参数有不同的取值且不同的取值会模型的分类或者预测会产生直系的影响。
    knn = KNeighborsClassifier(n_neighbors=5)
    print(knn)
    # 6.使用训练集数据训练模型
    # X:训练集的特征数据、特征数据的维度必须是二维。
    # y:训练集的标签数据
    knn = knn.fit(x_train, y_train)
    # 7.测试模型:使用测试数据
    # predict表示使用训练好的模型实现分类或者预测
    y_pred = knn.predict(x_test)
    # 模型基于测试数据返回的分类结果
    y_true = y_test
    # 测试集真实的分类结果
    print('模型的分类结果:', y_pred)
    print('真实的分类结果:', y_true)
    score = knn.score(x_test, y_test)
    print(score)
    # 对训练集进行交叉验证
    # cross_val_score(knn, x_train,y_train, cV=5).mean ()
    for k in range(3, 20):
        knn = KNeighborsClassifier(n_neighbors=k)
        score = cross_val_score(knn, x_train, y_train, cv=6).mean()
        scores.append(score)
        ks.append(k)
    plt.plot(ks, scores)
    plt.show()
    ##
    from sklearn.linear_model import LogisticRegression
    knn = KNeighborsClassifier(n_neighbors=5)
    print(cross_val_score(knn, x_train, y_train, cv=10).mean())
    lr = LogisticRegression()
    print(cross_val_score(lr, x_train, y_train, cv=10).mean())


if __name__ == '__main__':

    iris_test()

==结果如下 ==

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a----          2022/5/6     10:23           3251 Annual_Income.py
-a----         2022/4/27     17:43           1681 California_Home_Value_Data_set.py
-a----          2022/4/7     17:16            301 cinfig.ini
-a----          2022/5/6     10:26           1577 DatingTest.py
-a----         2022/4/26     15:27            464 Dimensionality_reduction.py
-a----         2022/4/29     14:31            724 DrawDecision_Tree.py
-a----         2022/4/28     17:40            833 Example_test.py
-a----         2022/4/28     14:55            635 GaosiModel.py
-a----         2022/4/27     16:39            863 House_price.py
-a----         2022/4/29     14:31           1032 iris
-a----         2022/4/29     14:31          19818 iris.pdf
-a----         2022/4/26     16:45           1857 Iris_classification.py
-a----         2022/4/26     15:37            943 Iris_Data.py
-a----          2022/6/2     14:27           2384 Iris_test.py
-a----         2022/4/28     16:13           1641 LaPuLaSi.py
-a----         2022/4/28     16:43           1130 LuoJiHuiGui.py
-a----         2022/4/26     14:42            339 NormalData.py
-a----          2022/5/6     10:48           2296 OverFitting.py
-a----         2022/4/29     14:24            644 RedWindData.py
-a----         2022/4/29     15:52           1740 SuiJiTest.py
-a----          2022/4/7     17:15           1130 test01.py
-a----         2022/4/29     17:06            762 Test02.py
-a----          2022/5/6     10:51           2244 Titanic_Data.py
-a----         2022/4/26      9:49            607 Variance.py
-a----         2022/4/29     14:26           1894 wine.dot
-a----         2022/4/25     16:08            644 WordToNumber.py
-a----         2022/4/29     16:56           1299 XgboostTest.py
-a----          2022/4/1      9:50              0 __init__.py


PS D:\develop\Pycharm\PYCharmWorkSpace\modelTest\com\cn\gjdw\test02> python.exe .\Iris_test.py          
(120, 4)
(30, 4)
(120,)
(30,)
KNeighborsClassifier()
模型的分类结果: [2 0 1 1 1 1 2 1 0 0 2 1 0 2 2 0 1 1 2 0 0 2 2 0 2 1 1 1 0 0]
真实的分类结果: [2 0 1 1 1 2 2 1 0 0 2 2 0 2 2 0 1 1 2 0 0 2 1 0 2 1 1 1 0 0]
0.9
0.9833333333333332
D:\develop\Pycharm\PYCharmWorkSpace\windowsAgentInfo_for_python\venv\05\lib\site-packages\sklearn\linear_model\_logistic.py:818: ConvergenceWarning: lbfgs failed to converge (status=1)
:
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG,
D:\develop\Pycharm\PYCharmWorkSpace\windowsAgentInfo_for_python\venv\05\lib\site-packages\sklearn\linear_model\_logistic.py:818: ConvergenceWarning: lbfgs failed to converge (status=1)
:
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG,
D:\develop\Pycharm\PYCharmWorkSpace\windowsAgentInfo_for_python\venv\05\lib\site-packages\sklearn\linear_model\_logistic.py:818: ConvergenceWarning: lbfgs failed to converge (status=1)
:
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG,
0.9833333333333332
Filename: .\Iris_test.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    16    107.0 MiB    107.0 MiB           1   @profile
    17                                         def iris_test():
    18    107.0 MiB      0.0 MiB           1       scores = []
    19    107.0 MiB      0.0 MiB           1       ks = []
    20    107.0 MiB      0.0 MiB           1       iris = datasets.load_iris()
    21    107.0 MiB      0.0 MiB           1       feature = iris['data']
    22    107.0 MiB      0.0 MiB           1       target = iris['target']
    23                                             # 拆分出训练集和测试集
    24    107.1 MiB      0.1 MiB           1       x_train, x_test, y_train, y_test = train_test_split(feature, target, test_size=0.2, random_state=2020)
    25    107.1 MiB      0.0 MiB           1       print(x_train.shape)
    26    107.1 MiB      0.0 MiB           1       print(x_test.shape)
    27    107.1 MiB      0.0 MiB           1       print(y_train.shape)
    28    107.1 MiB      0.0 MiB           1       print(y_test.shape)
    29                                             # 5.实例化模型对象
    30                                             # n_neighbors == k-在knn中k的取值不同会直接导致分类结果的不同,n_neighbors就是K值。
    31                                             # ·模型的超参数:如果模型参数有不同的取值且不同的取值会模型的分类或者预测会产生直系的影响。
    32    107.1 MiB      0.0 MiB           1       knn = KNeighborsClassifier(n_neighbors=5)
    33    107.1 MiB      0.0 MiB           1       print(knn)
    34                                             # 6.使用训练集数据训练模型
    35                                             # X:训练集的特征数据、特征数据的维度必须是二维。
    36                                             # y:训练集的标签数据
    37    107.3 MiB      0.2 MiB           1       knn = knn.fit(x_train, y_train)
    38                                             # 7.测试模型:使用测试数据
    39                                             # predict表示使用训练好的模型实现分类或者预测
    40    107.4 MiB      0.2 MiB           1       y_pred = knn.predict(x_test)
    41                                             # 模型基于测试数据返回的分类结果
    42    107.4 MiB      0.0 MiB           1       y_true = y_test
    43                                             # 测试集真实的分类结果
    44    107.5 MiB      0.0 MiB           1       print('模型的分类结果:', y_pred)
    45    107.5 MiB      0.0 MiB           1       print('真实的分类结果:', y_true)
    46    107.5 MiB      0.0 MiB           1       score = knn.score(x_test, y_test)
    47    107.5 MiB      0.0 MiB           1       print(score)
    48                                             # 对训练集进行交叉验证
    49                                             # cross_val_score(knn, x_train,y_train, cV=5).mean ()
    50    107.8 MiB      0.0 MiB          18       for k in range(3, 20):
    51    107.8 MiB      0.0 MiB          17           knn = KNeighborsClassifier(n_neighbors=k)
    52    107.8 MiB      0.4 MiB          17           score = cross_val_score(knn, x_train, y_train, cv=6).mean()
    53    107.8 MiB      0.0 MiB          17           scores.append(score)
    54    107.8 MiB      0.0 MiB          17           ks.append(k)
    55    140.7 MiB     32.9 MiB           1       plt.plot(ks, scores)
    56    143.6 MiB      2.9 MiB           1       plt.show()
    57                                             ##
    58    143.6 MiB      0.0 MiB           1       from sklearn.linear_model import LogisticRegression
    59    143.6 MiB      0.0 MiB           1       knn = KNeighborsClassifier(n_neighbors=5)
    60    143.6 MiB      0.0 MiB           1       print(cross_val_score(knn, x_train, y_train, cv=10).mean())
    61    143.6 MiB      0.0 MiB           1       lr = LogisticRegression()
    62    144.0 MiB      0.5 MiB           1       print(cross_val_score(lr, x_train, y_train, cv=10).mean())
  1. 这样就可以清楚的看到代码用到变量内存使用情况,这样可以检查到行,比较精确,也方便我们检查问题.
  2. tracemalloc 模块是一个用于对 python 已申请的内存块进行debug的工具。它能提供以下信息:
    定位对象分配内存的位置
    按文件、按行统计python的内存块分配情况: 总大小、块的数量以及块平 均大小。
    对比两个内存快照的差异,以便排查内存泄漏
import tracemalloc
import numpy as np

tracemalloc.start()

length = 10000
test_array = np.random.randn(length)  # 分配一个定长随机数组
snapshot = tracemalloc.take_snapshot()  # 内存摄像
top_stats = snapshot.statistics('lineno')  # 内存占用数据获取

print('[Top 10]')
for stat in top_stats[:10]:  # 打印占用内存最大的10个子进程
    print(stat)

结果如下

D:/develop/Pycharm/PYCharmWorkSpace/modelTest/com/cn/gjdw/test02/dsg.py:18: size=78.8 KiB, count=4, average=19.7 KiB
D:/develop/Pycharm/PYCharmWorkSpace/modelTest/com/cn/gjdw/test02/dsg.py:19: size=328 B, count=3, average=109 B

能显示内存占用大小, 但是 还是 memory_profiler 比较好用,定位比较精确