Python 定位程序性能瓶颈

Python Python

创建时间:2018-08-09 22:57

字数:913 阅读:

profile
pstats
使用 timeit

对代码优化的前提是需要了解性能瓶颈在什么地方，程序运行的主要时间是消耗在哪里，对于比较复杂的代码可以借助一些工具来定位，python 内置了丰富的性能分析工具，如 profile, cProfile 与 hotshot 等。其中 Profiler 是 python 自带的一组程序，能够描述程序运行时候的性能，并提供各种统计帮助用户定位程序的性能瓶颈。Python 标准模块提供三种 profilers:cProfile, profile 以及 hotshot。

profile

profile 的使用非常简单，只需要在使用之前进行 import 即可。具体实例如下：

import profile

def profileTest():
    for i in range(20000):
        sent = "a sentence for measuring a find function"
        print(sent[16:])

if __name__ == "__main__":
   profile.run("profileTest()")

运行结果：

         20005 function calls in 0.047 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.047    0.047 :0(exec)
    20000    0.031    0.000    0.031    0.000 :0(print)
        1    0.000    0.000    0.000    0.000 :0(setprofile)
        1    0.000    0.000    0.047    0.047 <string>:1(<module>)
        1    0.000    0.000    0.047    0.047 profile:0(profileTest())
        0    0.000             0.000          profile:0(profiler)
        1    0.016    0.016    0.047    0.047 writefile.py:38(profileTest)

其中输出每列的具体解释如下：

ncalls：表示函数调用的次数；
tottime：表示指定函数的总的运行时间，除掉函数中调用子函数的运行时间；
percall：（第一个 percall）等于 tottime/ncalls；
cumtime：表示该函数及其所有子函数的调用运行的时间，即函数开始调用到返回的时间；
percall：（第二个 percall）即函数运行一次的平均时间，等于 cumtime/ncalls；
filename:lineno(function)：每个函数调用的具体信息；

如果需要将输出以日志的形式保存，只需要在调用的时候加入另外一个参数。如 profile.run("profileTest()", "testprof")。

pstats

对于 profile 的剖析数据，如果以二进制文件的时候保存结果的时候，可以通过 pstats 模块进行文本报表分析，它支持多种形式的报表输出，是文本界面下一个较为实用的工具。使用非常简单：

import pstats

p = pstats.Stats('testprof')
p.sort_stats("name").print_stats()

显示结果类似上文。

其中 sort_stats() 方法能够对剖分数据进行排序，可以接受多个排序字段，如 sort_stats('name', 'file') 将首先按照函数名称进行排序，然后再按照文件名进行排序。常见的排序字段有 calls( 被调用的次数 )，time（函数内部运行时间），cumulative（运行的总时间）等。此外 pstats 也提供了命令行交互工具，执行 python – m pstats 后可以通过 help 了解更多使用方式。

对于大型应用程序，如果能够将性能分析的结果以图形的方式呈现，将会非常实用和直观，常见的可视化工具有 Gprof2Dot，visualpytune，KCacheGrind 等。

https://www.ibm.com/developerworks/cn/linux/l-cn-python-optim/index.html

使用 timeit

from timeit import timeit
import re

def find(string, text):
    if string.find(text) > -1:
        pass

def re_find(string, text):
    if re.match(text, string):
        pass

def best_find(string, text):
    if text in string:
       pass

print timeit("find(string, text)", "from __main__ import find; string='lookforme'; text='look'")
print timeit("re_find(string, text)", "from __main__ import re_find; string='lookforme'; text='look'")
print timeit("best_find(string, text)", "from __main__ import best_find; string='lookforme'; text='look'")

运行结果：

0.25795071095385774
0.8158124762311382
0.10521701806419292

因此，字符串搜索，应该使用 in 运算符，因为它更容易阅读，但因为它也更快。

https://stackoverflow.com/questions/4901523/whats-a-faster-operation-re-match-search-or-str-find/4901653#4901653

转载请注明来源，欢迎对文章中的引用来源进行考证，欢迎指出任何有错误或不够清晰的表达。可以在下面评论区评论，也可以邮件至 bin07280@qq.com