最近总是有需要单独对某一个类型的通路进行超几何分布的p值计算,这里记录一下python包的计算方法

使用scipy的stat里面的hypergeom.sf方法进行富集分析的p值计算

hsaxxxxx AA and Linoleic metabolism KEGG pathways Pathways KEGG (Homo sapiens (human)) 59 17 3586 141 3.32E-11

------------

set

in set

background

in background

pathway

59

17

3586

141

description

k

x

m+n

m

x: the number of white balls drawn without replacement from an urn which contains both black and white balls.
m: the number of white balls in the urn
**n: ** the number of black balls in the urn
**k: **number of balls drawn from the urn

from scipy import stats
#需要注意的是16是由17-1得到的
stats.hypergeom.sf(16,3586,141, 59)

R中的实现方式

phyper(x, m, n, k, lower.tail=FALSE)