python性能分析 python 性能分析 cprofile

转载

mob6454cc6ccc8a 2023-08-29 07:26:15

文章标签 python性能分析 Line 性能分析 python 文章分类 Python 后端开发

性能分析工具的使用

cProfile

介绍

它是一种确定性的性能分析器，提供了一组API帮助开 发者收集Python程序运行的信息，更确切地说，是统计每个函数消耗的 CPU时间。同时它还提供了其他细节，比如函数被调用的次数。 cProfile只测量CPU时间，并不关心内存消耗和其他与内存相关的信 息统计。尽管如此，它是代码优化过程中一个很不错的起点，因为大多 数时候，这个分析工具都会快速地为我们提供一组优化方案。 cProfile不需要安装，因为它是语言自带的一部分。要使用它，直接 导入cProfile包即可。

例子

简单调用

import cProfile
import re

cProfile.run('re.compile("foo|bar")')

调用函数

def test():
	pass
cProfile.run('test()')

命令行使用

# 直接把分析结果打印到控制台
python -m cProfile test.py
# 把分析结果保存到文件中
python -m cProfile -o result.out test.py
# 增加排序方式
python -m cProfile -o result.out -s cumulative test.py

当命令被调用时，会执行下面这个函数：

exec(command, __main__.__dict__, __main__.__dict__)

故这种写法会报错

import cProfile
def runRe():
    import re
    cProfile.run('re.compile("foo|bar")') #改为它即可正常运行cProfile.runctx('re.compile("foo|bar")', None, locals())
runRe()

Stats类

利用stats打印函数的调用链

import cProfile
import pstats
def runRe():
    import re
    re.compile("foo|bar")

prof = cProfile.Profile()
prof.enable()
runRe()
prof.create_stats()
p = pstats.Stats(prof)
p.print_callers()

用来分析cProfile输出文件xxx.out

import pstats

# 创建Stats对象
p = pstats.Stats("result.out")

# strip_dirs(): 去掉无关的路径信息
# sort_stats(): 排序，支持的方式和上述的一致
# print_stats(): 打印分析结果，可以指定打印前几行

# 和直接运行cProfile.run("test()")的结果是一样的
p.strip_dirs().sort_stats(-1).print_stats()

# 按照函数名排序，只打印前3行函数的信息, 参数还可为小数,表示前百分之几的函数信息 
p.strip_dirs().sort_stats("name").print_stats(3)

# 按照运行时间和函数名进行排序
p.strip_dirs().sort_stats("cumulative", "name").print_stats(0.5)

# 如果想知道有哪些函数调用了sum_num
p.print_callers(0.5, "sum_num")

# 查看test()函数中调用了哪些函数
p.print_callees("test")

cProfile其它的方法

enable()：表示开始收集性能分析数据。
disable()：表示停止收集性能分析数据。
create_stats()：表示停止收集数据，并为已收集的数据创建

stats对象。

print_stats(sort=-1)：创建一个stats对象，打印分析结果。
dump_stats(filename)：把当前性能分析的内容写进一个文件。
run(cmd)：和之前介绍过的run函数相同。
runctx(cmd, globals, locals)：和之前介绍过的runctx函数

相同。

runcall(func, *args, **kwargs)：收集被调用函数func的性

能分析信息

运行结果如下（为以上简单调用的结果）

214 function calls (207 primitive calls) in 0.000 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        2    0.000    0.000    0.000    0.000 enum.py:283(__call__)
        2    0.000    0.000    0.000    0.000 enum.py:525(__new__)
        1    0.000    0.000    0.000    0.000 enum.py:815(__and__)
        1    0.000    0.000    0.000    0.000 re.py:232(compile)
        1    0.000    0.000    0.000    0.000 re.py:271(_compile)
        1    0.000    0.000    0.000    0.000 sre_compile.py:249(_compile_charset)
        1    0.000    0.000    0.000    0.000 sre_compile.py:276(_optimize_charset)
        2    0.000    0.000    0.000    0.000 sre_compile.py:453(_get_iscased)
        1    0.000    0.000    0.000    0.000 sre_compile.py:461(_get_literal_prefix)
        1    0.000    0.000    0.000    0.000 sre_compile.py:492(_get_charset_prefix)
        1    0.000    0.000    0.000    0.000 sre_compile.py:536(_compile_info)
        2    0.000    0.000    0.000    0.000 sre_compile.py:595(isstring)
        1    0.000    0.000    0.000    0.000 sre_compile.py:598(_code)
      3/1    0.000    0.000    0.000    0.000 sre_compile.py:71(_compile)
        1    0.000    0.000    0.000    0.000 sre_compile.py:759(compile)
        3    0.000    0.000    0.000    0.000 sre_parse.py:111(__init__)
        7    0.000    0.000    0.000    0.000 sre_parse.py:160(__len__)
       18    0.000    0.000    0.000    0.000 sre_parse.py:164(__getitem__)
        7    0.000    0.000    0.000    0.000 sre_parse.py:172(append)
      3/1    0.000    0.000    0.000    0.000 sre_parse.py:174(getwidth)
        1    0.000    0.000    0.000    0.000 sre_parse.py:224(__init__)
        8    0.000    0.000    0.000    0.000 sre_parse.py:233(__next)
        2    0.000    0.000    0.000    0.000 sre_parse.py:249(match)
        6    0.000    0.000    0.000    0.000 sre_parse.py:254(get)
        1    0.000    0.000    0.000    0.000 sre_parse.py:286(tell)
        1    0.000    0.000    0.000    0.000 sre_parse.py:417(_parse_sub)
        2    0.000    0.000    0.000    0.000 sre_parse.py:475(_parse)
        1    0.000    0.000    0.000    0.000 sre_parse.py:76(__init__)
        2    0.000    0.000    0.000    0.000 sre_parse.py:81(groups)
        1    0.000    0.000    0.000    0.000 sre_parse.py:903(fix_flags)
        1    0.000    0.000    0.000    0.000 sre_parse.py:919(parse)
        1    0.000    0.000    0.000    0.000 {built-in method _sre.compile}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.exec}
       25    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
    29/26    0.000    0.000    0.000    0.000 {built-in method builtins.len}
        2    0.000    0.000    0.000    0.000 {built-in method builtins.max}
        9    0.000    0.000    0.000    0.000 {built-in method builtins.min}
        6    0.000    0.000    0.000    0.000 {built-in method builtins.ord}
       48    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        5    0.000    0.000    0.000    0.000 {method 'find' of 'bytearray' objects}
        1    0.000    0.000    0.000    0.000 {method 'items' of 'dict' objects}

第一行告诉我们一共有214个函数调用被监控，其中207个是原生（primitive）调用，表明这些调用不涉及递归。
ncalls表示函数调用的次数。如果在这一列中有两个数值，就表示有递归调用。第二个数值是原生调用的次数，第一个数值是总调用次数。这个数值可以帮助识别潜在的bug（当数值大得异乎寻常时可能就是bug），或者可能需要进行内联函数扩展（inline expansion）的位置。
tottime是函数内部消耗的总时间（不包括调用其他函数的时间）。这列信息可以帮助开发者找到可以进行优化的、运行时间较长的循环。
percall是tottime除以ncalls，表示一个函数每次调用的平均消耗时间。
cumtime是之前所有子函数消耗时间的累计和（也包含递归调用时间）。这个数值可以帮助开发者从整体视角识别性能问题，比如算法选择错误。
另一个percall是用cumtime除以原生调用的数量，表示到该函数调用时，每个原生调用的平均消耗时间。
filename:lineno(function)显示了被分析函数所在的文件名、行号、函数名。

cProfile工具的局限

不存在透明的性能分析器。也就是说，即使像cProfile这样消耗极小 的性能分析器，仍然会对代码实际的性能造成影响。当一个事件被触发 时，事件实际发生的时间相比性能分析器查询到的系统内部时钟的时间，还是会有一些延迟。另外，当程序计数器离开性能分析器代码，回 到用户代码中继续执行时，程序也会出现滞后。 除了这些之外，作为计算机内部的任何一段代码，内部时钟都有一个精 度范围，任何小于这个精度的测量值都会被忽略。也就是说，如果进行 性能分析的代码含有许多递归调用，或者一个函数需要调用许多函数， 开发者就应该对这个函数做特殊处理，因为测量误差不断地累计最终会 变得非常显著。

Line Profiler

不为python sdk自带，需要手动下载

pip install line_profiler

我们通过cProfile定位到了具体的耗时函数，下面就需要具体定位瓶颈出在哪行代码，这个时候就到了Line Profiler出场的时候了。与cProfile相比，Line Profiler的结果更加直观，它可以告诉你一个函数中每一行的耗时。
用法

定义一个测试文件test.py

# demo.py
@profile
def foo():
    task = []
    for a in range(0, 101):
        for b in range(0, 101):
            if a + b == 100:
                task.append((a, b))
    return task
 
@profile
def run():
    for item in foo():
        pass
 
if __name__ == '__main__':
    run()

运行

kernprof -l -v test.py #-l逐行分析

结果

Wrote profile results to test.py.lprof
Timer unit: 1e-06 s

Total time: 0.0051146 s
File: test.py
Function: foo at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           @profile
     2                                           def foo():
     3         1          0.4      0.4      0.0      task = []
     4       102         22.6      0.2      0.4      for a in range(0, 101):
     5     10302       2317.9      0.2     45.3          for b in range(0, 101):
     6     10201       2738.2      0.3     53.5              if a + b == 100:
     7       101         35.2      0.3      0.7                  task.append((a, b))
     8         1          0.3      0.3      0.0      return task

Total time: 0.0089912 s
File: test.py
Function: run at line 11

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    11                                           @profile
    12                                           def run():
    13       102       8970.8     87.9     99.8      for item in foo():
    14       101         20.4      0.2      0.2          pass

Timer unit: 1e-06 s：时间单位；
Total time: 0.004891 s：总时间；
Hit：代码运行次数；
%Time：代码占了它所在函数的消耗的时间百分比，通常直接看这一列。

memory_profile

下载

pip3 install memory_profiler

测试代码

def sum_num(max_num):
    total = 0
    for i in range(max_num):
        total += i
    return total


@profile
def test():
    total = 0
    for i in range(40000):
        total += i

    t1 = sum_num(10000000)
    t2 = sum_num(200000)
    t3 = sum_num(300000)
    t4 = sum_num(400000)
    t5 = sum_num(500000)
    test2()

    return total


def test2():
    total = 0
    for i in range(40000):
        total += i

    t6 = sum_num(600000)
    t7 = sum_num(700000)

    return total


test()

python -m memory_profiler test.py

运行结果

Filename: test.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
     8   39.660 MiB   39.660 MiB           1   @profile
     9                                         def test():
    10   39.660 MiB    0.000 MiB           1       total = 0
    11   39.660 MiB    0.000 MiB       40001       for i in range(40000):
    12   39.660 MiB    0.000 MiB       40000           total += i
    13
    14   39.660 MiB    0.000 MiB           1       t1 = sum_num(10000000)
    15   39.660 MiB    0.000 MiB           1       t2 = sum_num(200000)
    16   39.660 MiB    0.000 MiB           1       t3 = sum_num(300000)
    17   39.660 MiB    0.000 MiB           1       t4 = sum_num(400000)
    18   39.660 MiB    0.000 MiB           1       t5 = sum_num(500000)
    19   39.660 MiB    0.000 MiB           1       test2()
    20
    21   39.660 MiB    0.000 MiB           1       return total

Mem usage : 运行内存大小；
Increment : 运行当前代码后，增加的内存。

装饰器捕获函数运行时间

模板

import time
from functools import wraps
def simple_profiling(fn):
    @wraps(fn)  # 对外暴露调用装饰器函数的函数名和docstring
    def wrapped(*args, **kwargs):
        t1 = time.time()
        result = fn(*args, **kwargs)
        t2 = time.time()
        print (
            "@simple_profiling:" + fn.func_name + " took " + str(t2 - t1) + " seconds")
        return result
    return wrapped
@simple_profiling
def foo(a, b, c)
    pass