pythoncorr模块 corrcoef python

转载

mob64ca1418aeab 2024-01-04 15:03:52

文章标签 pythoncorr模块 cpython cython Cython python ci 文章分类 Python 后端开发

在一些高性能计算式，python的速度往往不能满足需求，你可以使用一些方法提高运算速度，比如使用Numba或者Cython，这里简要介绍一下Cython的基本用法，详细使用说明还是建议看文档

本代码使用jupyter notebook中运行，需要安装cython

pip install cython

1.基本用法

我们先测试一下，这是一个斐波那契数列

from time import time
def fib_loop(n):
a = 0
b = 1
for i in range(n + 1):
a, b = b, a + b
return a
start = time()
result = fib_loop(10000)
end = time()
print(f'用时: {end-start}')
[out]:
用时: 0.00708460807800293

接下来我们做一些小小的修改, 我们将指定数据类型和返回值类型，代码分散在多个框内，模拟jupyter notebook的格式

from time import time
%load_ext Cython

1.在jupyer编写Cython是需要使用%%cython让jupyer识别该对话框内是Cython代码

2.如果需要使用python调用Cython函数则需要使用 cpdef 来定义函数

3.尽可能事先定义数据类型(使用 cdef)，和返回值类型，这样程序会更快的执行

%%cython
cpdef int c_fib_loop(int n):
cdef int a = 0
cdef int b = 1
for i in range(n + 1):
a, b = b, a + b
return a
start = time()
ret = c_fib_loop(10000)
end = time()
print(f'用时: {end-start}')
[out]:
用时: 0.00011348724365234375

可以看出速度提升了0.0070/0.000113=61(倍)，提升巨大(实际运行中每次有误差)

2.调用子函数和numpy

这是一个堆排序的纯python代码

import random
from time import time
# 子函数
def heapify(arr, n, i):
largest = i
l = 2 * i + 1 # left = 2*i + 1
r = 2 * i + 2 # right = 2*i + 2
if l < n and arr[i] < arr[l]:
largest = l
if r < n and arr[largest] < arr[r]:
largest = r
if largest != i:
arr[i],arr[largest] = arr[largest],arr[i] # 交换
heapify(arr, n, largest)
def heapSort(arr):
n = len(arr)
for i in range(n, -1, -1):
heapify(arr, n, i)
for i in range(n-1, 0, -1):
arr[i], arr[0] = arr[0], arr[i] # 交换
heapify(arr, i, 0)
arr = [x for x in range(0, 10000)]
random.seed(42)
random.shuffle(arr)
start = time()
heapSort(arr)
end = time()
print(f'用时: {end-start}')
[out]:
用时: 0.14600777626037598

改写上述算法为Cython代码

这里我们将子函数 heapify 使用cdef定义，python无法直接调用该子函数

%%cython
import numpy as np
cimport numpy as np
cdef c_heapify(np.int32_t[:] arr, np.int32_t n, np.int32_t i):
cdef np.int32_t l, r, largest
largest = i
l = 2 * i + 1 # left = 2*i + 1
r = l + 2 # right = 2*i + 2
if l < n and arr[i] < arr[l]:
largest = l
if r < n and arr[largest] < arr[r]:
largest = r
if largest != i:
arr[i], arr[largest] = arr[largest], arr[i] # 交换
c_heapify(arr, n, largest)
cpdef c_heapSort(np.int32_t[:] arr):
cdef np.int32_t i, n
n = len(arr)
for i in range(n, -1, -1):
c_heapify(arr, n, i)
for i in range(n-1, 0, -1):
arr[i], arr[0] = arr[0], arr[i] # 交换
c_heapify(arr, i, 0)
import random
from time import time
arr = [x for x in range(0, 10000)]
random.seed(42)
random.shuffle(arr)
arr = np.array(arr, dtype=np.int32)
start = time()
c_heapSort(arr)
end = time()
print(f'用时: {end-start}')
[out]:
用时: 0.003444671630859375

注意事项：

1.cdef定义的函数无法直接在python代码中调用，可以在cpdef定义的函数中调用

2.在Cython调用numpy时需要同时cimport numpy

3.python的int相当于int32，numpy的定义需要再后面加_t

比如: np.int32_t # numpy的32位整形
np.int32_t[:] # numpy的32位整形一维数组
np.int32_t[:, :] # numpy的32位整形二维数组

详细对应关系如下

NumPy dtype Numpy Cython type C Cython type identifier
np.bool_ None None
np.int_ cnp.int_t long
np.intc None int
np.intp cnp.intp_t ssize_t
np.int8 cnp.int8_t signed char
np.int16 cnp.int16_t signed short
np.int32 cnp.int32_t signed int
np.int64 cnp.int64_t signed long long
np.uint8 cnp.uint8_t unsigned char
np.uint16 cnp.uint16_t unsigned short
np.uint32 cnp.uint32_t unsigned int
np.uint64 cnp.uint64_t unsigned long
np.float_ cnp.float64_t double
np.float32 cnp.float32_t float
np.float64 cnp.float64_t double
np.complex_ cnp.complex128_t double complex
np.complex64 cnp.complex64_t float complex
np.complex128 cnp.complex128_t double complex

3.编译

编译时需要两个文件，setup.py文件和.pyx文件

.pyx文件就是刚才编译的Cython代, 这里我们将该文件命名为c_func.pyx
cpdef int c_fib_loop(int n):
cdef int a = 0
cdef int b = 1
for i in range(n + 1):
a, b = b, a + b
return a
setup.py文件
# -*- coding: utf-8 -*-
"""
-------------------------------------------------
File Name： setup
Description :
Author : Asdil
date： 2018/11/28
-------------------------------------------------
Change Activity:
2018/11/28:
-------------------------------------------------
"""
__author__ = 'Asdil'
from distutils.core import setup
from Cython.Build import cythonize
setup(name='c_func',
ext_modules=cythonize("c_func.pyx"))
# 如果.pyx 文件中使用了cimport numpy as np
# cimport numpy 在pyx不能注释，可以使用下面代码替换掉上面的代码：
# from distutils.core import setup
# from Cython.Build import cythonize
# import numpy as np
# import os
# os.environ["C_INCLUDE_PATH"] = np.get_include()
# setup(name='c_func', ext_modules=cythonize("c_func.pyx"))
# 到这两个文件的目录在命令行输入:
# python setup.py build_ext --inplace
在命令行这两份文件的目录输入:
python setup.py build_ext --inplace

即可编译

编译前

编译后

在编译完成后目录中会出现几个文件

1.build文件，这个不用管

2.c_func.c文件, 这个也不用改

3.c_func.cpython-36m-x86_64-linux-gnu.so(根据你cython版本不同文件名可能不一样)，这个文件是我们需要的

4.代码已经上传到gitlab大家可以下载下来编译一下，然后在命令行运行

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。