0.准备工作
实在懒得写R包安装代码,自己搞定一下哦~
library(clusterProfiler)
library(DOSE)
library(org.Hs.eg.db)
1.KEGG富集分析
KEGG富集分析可以用enrichKEGG很方便地完成:
data(geneList, package="DOSE")
gene 2]
kk organism = 'hsa',
pvalueCutoff = 0.05)
富集分析的输出结果是一个enrichResult对象,里面有一列记录了富集到每个通路上的基因:
class(kk)
## [1] "enrichResult"
## attr(,"package")
## [1] "DOSE"
head(kk@result$geneID)
## [1] "8318/991/9133/890/983/4085/7272/1111/891/4174/9232"
## [2] "991/9133/983/4085/51806/6790/891/9232/3708/5241"
## [3] "2305/4605/9133/890/983/51806/1111/891/776/3708"
## [4] "3627/10563/6373/4283/6362/6355/9547/1524"
## [5] "4312/9415/9370/5105/2167/3158/5346"
## [6] "9133/890/983/4085/6790/891/5241"
可以看到,都是ENTREZID,是不是不够直观?所以Y叔用了一个函数,setReadable
,用一下神清气爽,ENTREZID转成symbol了。
kk2 = setReadable(kk,
OrgDb = "org.Hs.eg.db",
keyType = "ENTREZID")
head(kk2@result$geneID)
## [1] "CDC45/CDC20/CCNB2/CCNA2/CDK1/MAD2L1/TTK/CHEK1/CCNB1/MCM5/PTTG1"
## [2] "CDC20/CCNB2/CDK1/MAD2L1/CALML5/AURKA/CCNB1/PTTG1/ITPR1/PGR"
## [3] "FOXM1/MYBL2/CCNB2/CCNA2/CDK1/CALML5/CHEK1/CCNB1/CACNA1D/ITPR1"
## [4] "CXCL10/CXCL13/CXCL11/CXCL9/CCL18/CCL8/CXCL14/CX3CR1"
## [5] "MMP1/FADS2/ADIPOQ/PCK1/FABP4/HMGCS2/PLIN1"
## [6] "CCNB2/CCNA2/CDK1/MAD2L1/AURKA/CCNB1/PGR"
2.多组的富集分析
将要比较的每组基因ID组成向量,合并成列表,比如下面这个:
data(gcSample)
lapply(gcSample, head)
## $X1
## [1] "4597" "7111" "5266" "2175" "755" "23046"
##
## $X2
## [1] "23450" "5160" "7126" "26118" "8452" "3675"
##
## $X3
## [1] "894" "7057" "22906" "3339" "10449" "6566"
##
## $X4
## [1] "5573" "7453" "5245" "23450" "6500" "4926"
##
## $X5
## [1] "5982" "7318" "6352" "2101" "8882" "7803"
##
## $X6
## [1] "5337" "9295" "4035" "811" "23365" "4629"
##
## $X7
## [1] "2621" "2665" "5690" "3608" "3550" "533"
##
## $X8
## [1] "2665" "4735" "1327" "3192" "5573" "9528"
富集比较,输出出来的结果里仍然是有geneID的,也是entrezID
ck "enrichKEGG")
head(as.data.frame(ck)$geneID)
## [1] "991/1869/890/1871/701/990/10926/9088/8317/9700/9134/1029/2810/699/11200/23594/8555/4173"
## [2] "4067/3383/7128/1869/890/1871/578/864/637/9641/6891/355/9134/5971/916/956/6850/7187/3551/919/4734/958/6772"
## [3] "100/6891/3932/973/916/925/958/64421"
## [4] "3383/991/1869/890/1871/113/701/9700/3932/9134/5971/916/4487/3600/1029/3551/8498/4488/11200/4776/4214/958"
## [5] "7057/3339/1299/3695/1101/3679/3910/3696/3693"
## [6] "2919/4982/3977/6375/8200/608/8792/3568/2057/1438/8718/655/652/10220/50615/51561/7042"
3.发现问题并解决它
试着把它setReadable转换一下,发生了报错:
kk2 = setReadable(ck,
OrgDb = "org.Hs.eg.db",
keyType = "ENTREZID")
## Error in if (x@readable) return(x): 参数长度为零
难道是不支持compareClusterResult这种对象类型吗?
还真不是,试试就知道了:
kk2 = setReadable(as.data.frame(ck),
OrgDb = "org.Hs.eg.db",
keyType = "ENTREZID")
## Error in setReadable(as.data.frame(ck), OrgDb = "org.Hs.eg.db", keyType = "ENTREZID"): input should be an 'enrichResult' , 'gseaResult' or 'compareClusterResult' object...
报错信息里有哦,compareClusterResult是支持的。
那哪里错了?看上一个报错,提到了x@readable
,readable是ck的一个元素,enrichGO也有个参数叫readable,如果设置readable=T,就和setReadable实现的事情一样(目前,enrichKEGG并不支持这个参数)。
查看ck里的readable元素:
ck@readable
## logical(0)
这就是问题。本来readable应该有一个逻辑值,要么T要么F,但他现在是空的,报错信息里也说了,x@readablec长度为零。
解决办法还是试试,把他补上行不行?
ck@readable = F
ck2 = setReadable(ck,
OrgDb = "org.Hs.eg.db",
keyType = "ENTREZID")
head(as.data.frame(ck2)$geneID)
## [1] "CDC20/E2F1/CCNA2/E2F3/BUB1B/CDC6/DBF4/PKMYT1/CDC7/ESPL1/CCNE2/CDKN2A/SFN/BUB1/CHEK2/ORC6/CDC14B/MCM4"
## [2] "LYN/ICAM1/TNFAIP3/E2F1/CCNA2/E2F3/BAK1/RUNX3/BID/IKBKE/TAP2/FAS/CCNE2/RELB/CD3E/ENTPD3/SYK/TRAF3/IKBKB/CD247/NEDD4/CD40/STAT1"
## [3] "ADA/TAP2/LCK/CD79A/CD3E/CD8A/CD40/DCLRE1C"
## [4] "ICAM1/CDC20/E2F1/CCNA2/E2F3/ADCY7/BUB1B/ESPL1/LCK/CCNE2/RELB/CD3E/MSX1/IL15/CDKN2A/IKBKB/RANBP3/MSX2/CHEK2/NFATC4/MAP3K1/CD40"
## [5] "THBS1/HSPG2/COL9A3/ITGB7/CHAD/ITGA7/LAMA4/ITGB8/ITGB5"
## [6] "CXCL1/TNFRSF11B/LIFR/XCL1/GDF5/TNFRSF17/TNFRSF11A/IL5RA/EPOR/CSF2RA/TNFRSF25/BMP7/BMP4/GDF11/IL21R/IL23A/TGFB2"
不试试怎么知道行不行呢,很多问题就是这样,试试出真知。
现在的clusterProfiler版本是3.16.0哦。说不定Y叔以后会更新一下,这个缺口就不存在了,但“试试”这个思维精髓值得记住?~