来看看能被我称为"神器"的,到底是什么?
APE:在R语言中分析系统发育和进化 (e.g. 在获得全基因组Alignment之后)
Emmanuel Paradis, Julien Claude & Korbinian Strimmer
法国蒙彼利埃大学 古生物生物学和系统发育实验室;慕尼黑大学统计学系
翻译APE发表的1篇原始文章,穿插一些代码、辅助信息,来了解APE的概况:
DOI: 10.1093/bioinformatics/btg412
APE (Analysis of Phylogenetics and Evolution) 是1个用R语言编写的用于分子进化和系统发育分析的免费软件包,提供了读写数据和操作系统发育树的实用函数,以及几种用于系统发育和进化分析的高级方法 (e.g. 比较和群体遗传方法)。
APE利用了许多用于统计和图形的R函数,并为开发、实现进一步的进化过程分析的统计方法,提供了灵活的框架。可从官方的R包存档 (http://cran.r-project.org/src/contrib/PACKAGES.html#ape,现在可能为 https://cran.r-project.org/web/packages/ape/index.html)获得。
从广义上说,系统发育分析涵盖了非常广泛的方法,从计算进化距离、重建基因树、估计分叉(Divergence)日期,到比较数据分析、进化速率估计和多样性(Diversification)分析。所有这些不同的任务都有一个特殊的共同点:严重依赖计算统计 (Computational statistics)。R系统是一个免费的独立于平台的开源分析环境,最近(2004)已经形成统计计算和图形(Graphics)事实上的标准 (Ihaka & Gentleman, 1996)。
R的1个优点是,通过编写专门的包,可以很容易地针对特定应用领域进行定制(Tailored to a particular application area)。特别是,R在生物信息学中的有用性已经在基因表达数据(即转录组)的分析中得到了令人印象深刻的证明(http://www.bioconductor.org)。APE是首次在系统发育和进化数据的分析中发挥R的优势。APE专注于使用系统发育和系谱(Genealogical)树作为输入的统计分析。
在v1.1版中,APE提供了读取、写入、绘制、操作系统发育树,并在系统发育框架中分析比较数据(Comparative data)、多样化分析、计算等位基因和核苷酸数据的距离、读取核苷酸序列和其它几个工具的功能,如Mantel测试、最小生成树的计算、群体遗传学参数的估计。
表1概述了APE中目前可用的函数。注意,有些方法 (如比较法、天际线图/Skyline plot等) 以前只能在专门的软件中使用。外部的树重建程序(如PHYLIP)可以通过标准的Shell命令从R调用。
表 1. APE v1.1版中可用的特定函数
mst 最小生成树 - Minimum Spanning Tree
phylo 进化树在ape包中的数据对象的类(Class)名称 - A single phylogenetic tree may have several representations in the Newick format and in the "phylo" class of objects used in ‘ape’.skyline estimate of effective population size from an estimated phylogeny. ltt.plot Lineages Through Time Plot. coalescent 溯祖. hivtree 193个HIV-1序列的系统发育树数据. opsin 视紫蛋白数据.
R语言的另1个优点是可以直接获得具有发表质量的图形输出,特别是使用PostScript设备时。例如,APE中的系统发育绘图功能处理颜色、线的粗细、字体、标签的间距,这些可为每个分枝(Branch)单独定义,因此可在单个系统发育图上表示3个不同的变量。
APE还产生复杂的群体遗传学图,如广义的天际线图(Generalized skyline plot, Strimmer和Pybus, 2001),只需1个命令。与任何R包一样,APE是命令行驱动的 (重现、维护、共享非常方便)。函数由用户调用,可能带有参数和选项。在R中使用APE的任何会话(Session)都从此命令开始(当然,需要提前安装):
library(ape)
这使得APE的函数(执行特定的统计、绘图等功能)在R环境中可用。下面的命令可显示这些函数的列表 (名称和简述):
library(help = ape)
APE中所有可用的R函数(表1)都以R超文本格式记录,有关它们使用的信息可以通过应用help命令检索,例如:
help(read.tree)
以标准Newick插句格式保存在磁盘上的文本文件(e.g. tree1.txt)中的进化树可被读取:
tree1 <- read.tree('tree1.txt')
这会将系统进化树存储在名为"tree1"的类(Class)'phylo'对象中。该对象中存储的信息(e.g. 分枝长度)可通过输入`tree1`(命令)来检查,并且可通过执行以下(命令),来获得进化分枝图(Cladogram)形式的图形输出:
plot(tree1)
这实际上调用了APE的plot.phylo函数,绘制了系统发育树tree1。由于R的面向对象的性质,命令plot(x) (泛型函数?)可能会给出1个完全不同的结果,取决于对象所属的类 (class of x)。默认情况下,树被绘制在图形窗口上(Graphical window),但可以根据操作系统以各种文件格式导出。
除了这些简单的示例之外,在面向对象的结构中表示系统发育树可以直接操作进化分析中使用的各种计算(方法)的系统发育数据。目前在APE中实施的一些方法:系统发育独立对比(Felsenstein, 1985; Harvey and Pagel, 1991),拟合生卒模型 (Fitting birth–death models. Nee et al., 1994; Pybus and Harvey, 2000),群体遗传分析(Nee et al., 1995;Strimmer and Pybus, 2001),进化速率的非参数平滑 (Sanderson, 1997),利用Klastorin的方法在系统发育树中估计基因分组组 (Misawa and Tajima, 2000)。
此外,在R函数hclust中实现的基于距离的聚类方法(Distance based clustering)可被APE使用,进而通过函数转换并生成类'phylo'类对象和'hclust'类对象。
APE中的类和方法 (如'phylo') 可以很容易地进一步扩展,来包括其它功能,例如:注释系统发育树。因此,APE不仅是一个数据分析包,也是一个开发和实现新方法的(开发)环境。此外,用C、C++或Fortran77编写的程序可从R中链接和调用,对于计算机密集型计算很有用。
(附) APE绘制天际线图 (示例,未美化)
# mcmc.popsize函数 - 可逆跳跃MCMC (马尔可夫链蒙特卡罗) 推断群体数量史 (Demographic History)
# get tree
data("hivtree.newick") # example tree in NH format
tree.hiv <- read.tree(text = hivtree.newick) # load tree
# run mcmc chain
mcmc.out <- mcmc.popsize(tree.hiv, nstep=100, thinning=1, burn.in=0,progress.bar=FALSE) # toy run
#mcmc.out <- mcmc.popsize(tree.hiv, nstep=10000, thinning=5, burn.in=500) # remove comments!!
# make list of population size versus time
popsize <- extract.popsize(mcmc.out)
# plot and compare with skyline plot
sk <- skyline(tree.hiv)
plot(sk, lwd=1, lty=3, show.years=TRUE, subst.rate=0.0023, present.year = 1997)
lines(popsize, show.years=TRUE, subst.rate=0.0023, present.year = 1997)
mcmc.popsize函数 - 可逆跳跃MCMC (马尔可夫链蒙特卡罗) 推断群体数量史 (Demographic History)
# skyline函数 - 使用天际线图估计有效种群大小(Effective Population Size)
# get tree
data("hivtree.newick") # example tree in NH format
tree.hiv <- read.tree(text = hivtree.newick) # load tree
# corresponding coalescent intervals
ci <- coalescent.intervals(tree.hiv) # from tree
# collapsed intervals
cl1 <- collapsed.intervals(ci,0)
cl2 <- collapsed.intervals(ci,0.0119)
#### classic skyline plot ####
sk1 <- skyline(cl1) # from collapsed intervals
sk1 <- skyline(ci) # from coalescent intervals
sk1 <- skyline(tree.hiv) # from tree
sk1
plot(skyline(tree.hiv))
skylineplot(tree.hiv) # shortcut
plot(sk1, show.years=TRUE, subst.rate=0.0023, present.year = 1997)
#### generalized skyline plot ####
sk2 <- skyline(cl2) # from collapsed intervals
sk2 <- skyline(ci, 0.0119) # from coalescent intervals
sk2 <- skyline(tree.hiv, 0.0119) # from tree
sk2
plot(sk2)
# classic and generalized skyline plot together in one plot
plot(sk1, show.years=TRUE, subst.rate=0.0023, present.year = 1997, col=c(grey(.8),1))
lines(sk2, show.years=TRUE, subst.rate=0.0023, present.year = 1997)
legend(.15,500, c("classic", "generalized"), col=c(grey(.8),1),lty=1)
# find optimal epsilon parameter using AICc criterion
find.skyline.epsilon(ci)
sk3 <- skyline(ci, -1) # negative epsilon also triggers estimation of epsilon
sk3$epsilon
skyline函数 - 使用天际线图估计有效种群大小(Effective Population Size)
# skylineplot函数 - 绘制天际线图
# get tree
data("hivtree.newick") # example tree in NH format
tree.hiv <- read.tree(text = hivtree.newick) # load tree
#### classic skyline plot
skylineplot(tree.hiv) # shortcut
#### plot classic and generalized skyline plots and estimate epsilon
sk.opt <- skylineplot.deluxe(tree.hiv)
sk.opt$epsilon
#### classic and generalized skyline plot ####
sk1 <- skyline(tree.hiv)
sk2 <- skyline(tree.hiv, 0.0119)
# use years rather than substitutions as unit for the time axis
plot(sk1, show.years=TRUE, subst.rate=0.0023, present.year = 1997, col=c(grey(.8),1))
lines(sk2, show.years=TRUE, subst.rate=0.0023, present.year = 1997)
legend(.15,500, c("classic", "generalized"), col=c(grey(.8),1),lty=1)
#### various skyline plots for different epsilons
layout(mat= matrix(1:6,2,3,byrow=TRUE))
ci <- coalescent.intervals(tree.hiv)
plot(skyline(ci, 0.0));title(main="0.0")
plot(skyline(ci, 0.007));title(main="0.007")
plot(skyline(ci, 0.0119),col=4);title(main="0.0119")
plot(skyline(ci, 0.02));title(main="0.02")
plot(skyline(ci, 0.05));title(main="0.05")
plot(skyline(ci, 0.1));title(main="0.1")
layout(mat= matrix(1:1,1,1,byrow=TRUE))
skylineplot函数 - 绘制天际线图
(附) 关于程辑包‘ape v5.7-1’的信息
library(help = ape)
Version: 5.7-1
Date: 2023-03-13
Built: R 4.3.0; x86_64-w64-mingw32; windows
Description: Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel's test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clock-like trees using mean path lengths and penalized/罚分 likelihood, dating trees with non-contemporaneous/非同期 sequences, translating DNA into AA sequences, and assessing sequence alignments.
Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ*, BIONJ*, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, T-Coffee, Muscle) whose results are returned into R.
全部索引
AAbin == Amino Acid Sequences
CADM.global == Congruence/一致 among distance matrices
DNAbin == Manipulate DNA Sequences in Bit-Level Format
DNAbin2indel == Recode Blocks of Indels
FastME == Tree Estimation Based on the Minimum Evolution Algorithm
Initialize.corPhyl == Initialize a 'corPhyl' Structure Object
LTT == Theoretical Lineage-Through Time Plots
MPR == Most Parsimonious Reconstruction
Moran.I == Moran's I Autocorrelation Index
SDM == Construction of Consensus Distance Matrix With SDM
ace == Ancestral Character Estimation
add.scale.bar == Add a Scale Bar to a Phylogeny Plot
additive == Incomplete Distance Matrix Filling
alex == Alignment Explorer With Multiple Devices
all.equal.DNAbin == Compare DNA Sets
all.equal.phylo == Global Comparison of two Phylogenies
alview == Print DNA or AA Sequence Alignement
ape-package == Analyses of Phylogenetics and Evolution
apetools == Tools to Explore Files
as.alignment == Conversion Among DNA Sequence Internal Formats
as.bitsplits == Split Frequencies and Conversion Among Split Classes
as.matching == Conversion Between Phylo and Matching Objects
as.phylo == Conversion Among Tree and Network Objects
as.phylo.formula == Conversion from Taxonomy Variables to Phylogenetic Trees
axisPhylo == Axis on Side of Phylogeny
balance == Balance of a Dichotomous/二歧分枝 Phylogenetic Tree
base.freq == Base frequencies from DNA Sequences
bd.ext == Extended Version of the Birth-Death Models to Estimate Speciation and Extinction Rates (估算物种形成和灭绝速率)
bd.time == Time-Dependent Birth-Death Models
binaryPGLMM == Phylogenetic Generalized Linear Mixed Model for Binary Data
bind.tree == Binds Trees
bionj == Tree Estimation Based on an Improved Version of the NJ Algorithm
bird.families == Phylogeny of the Families of Birds From Sibley and Ahlquist
bird.orders == Phylogeny of the Orders of Birds From Sibley and Ahlquist
birthdeath == Estimation of Speciation and Extinction Rates With Birth-Death Models
boot.phylo == Tree Bipartition and Bootstrapping Phylogenies
branching.times == Branching Times of a Phylogenetic Tree
c.phylo == Building Lists of Trees
carnivora == Carnivora/食肉类 body sizes and life history traits
checkAlignment == Check DNA Alignments
checkLabel == Checking Labels
checkValidPhylo == Check the Structure of a "phylo" Object
cherry == Number of Cherries and Null Models of Trees
chiroptera == Bat Phylogeny
chronoMPL == Molecular Dating With Mean Path Lengths
chronopl == Molecular Dating With Penalized Likelihood
chronos == Molecular Dating by Penalised Likelihood and Maximum Likelihood
clustal == Multiple Sequence Alignment with External Applications
coalescent.intervals == Coalescent/溯祖 Intervals
collapse.singles == Collapse Single Nodes
collapsed.intervals == Collapsed Coalescent Intervals
compar.cheverud == Cheverud's Comparative Method
compar.gee == Comparative Analysis with GEEs
compar.lynch == Lynch's Comparative Method
compar.ou == Ornstein-Uhlenbeck Model for Continuous Characters
comparePhylo == Compare Two "phylo" Objects
compute.brlen == Branch Lengths Computation
compute.brtime == Compute and Set Branching Times
consensus == Concensus Trees
cophenetic.phylo == Pairwise Distances from a Phylogenetic Tree
cophyloplot == Plots two phylogenetic trees face to face with links between the tips.
corBlomberg == Blomberg et al.'s Correlation Structure
corBrownian == Brownian Correlation Structure
corClasses == Phylogenetic Correlation Structures
corGrafen == Grafen's (1989) Correlation Structure
corMartins == Martins's (1997) Correlation Structure
corPagel == Pagel's "lambda" Correlation Structure
corphylo == Correlations among Multiple Traits with Phylogenetic Signal
correlogram.formula == Phylogenetic Correlogram/相关图
data.nex == NEXUS Data Example
dbd == Probability Density Under Birth-Death Models
def == Definition of Vectors for Plotting or Annotating
degree == Vertex Degrees in Trees and Networks
del.gaps == Delete Alignment Gaps in DNA Sequences
delta.plot == Delta Plots
dist.dna == Pairwise Distances from DNA Sequences
dist.gene == Pairwise Distances from Genetic Data
dist.topo == Topological Distances Between Two Trees
diversi.gof == Tests of Constant Diversification Rates (恒定多样化比率)
diversi.time == Analysis of Diversification with Survival Models
diversity.contrast.test == Diversity Contrast Test
dnds == dN/dS Ratio
drop.tip == Remove Tips in a Phylogenetic Tree
edges == Draw Additional Edges on a Plotted Tree
evonet == Evolutionary Networks
ewLasso == Incomplete distances and edge weights of unrooted topology
gammaStat == Gamma-Statistic of Pybus and Harvey
getAnnotationsGenBank == Read Annotations from GenBank
hivtree == Phylogenetic Tree of 193 HIV-1 Sequences
howmanytrees == Calculate Numbers of Phylogenetic Trees
identify.phylo == Graphical Identification of Nodes and Tips
image.DNAbin == Plot of DNA Sequence Alignement
is.binary == Test for Binary Tree
is.compatible == Check Compatibility of Splits
is.monophyletic == Is Group Monophyletic(单源)
is.ultrametric == Test if a Tree is Ultrametric/超度量/超矩阵
kronoviz == Plot Multiple Chronograms(编年/计时图) on the Same Scale
label2table == Label Management
ladderize == Ladderize(阶梯化) a Tree
latag2n == Leading and Trailing Alignment Gaps to N
lmorigin == Multiple regression through the origin
ltt.plot == Lineages Through Time Plot
makeLabel == Label Management
makeNodeLabel == Makes Node Labels
mantel.test == Mantel Test for Similarity of Two Matrices
mat3 == Three Matrices
mat5M3ID == Five Trees
mat5Mrand == Five Independent Trees
matexpo == Matrix Exponential/指数
mcconwaysims.test == McConway-Sims Test of Homogeneous Diversification
mcmc.popsize == Reversible Jump MCMC to Infer Demographic/群体数量 History
mixedFontLabel == Mixed Font Labels for Plotting
mrca == Find Most Recent Common Ancestors Between Pairs
mst == Minimum Spanning Tree
multi2di == Collapse and Resolve Multichotomies/多歧分枝
multiphylo == Manipulating Lists of Trees
mvr == Minimum Variance Reduction
nj == Neighbor-Joining Tree Estimation
njs == Tree Reconstruction from Incomplete Distances With NJ* or bio-NJ*
node.dating == node.dating
node.depth == Depth and Heights of Nodes and Tips
nodelabels == Labelling the Nodes, Tips, and Edges of a Tree
nodepath == Find Paths of Nodes
parafit == Test of host-parasite coevolution (宿主-寄主共进化)
pcoa == Principal Coordinate Analysis
phydataplot == Tree Annotation
phymltest == Fits a Bunch of Models with PhyML
pic == Phylogenetically Independent Contrasts
pic.ortho == Phylogenetically Independent Orthonormal/标准正交 Contrasts
plot.correlogram == Plot a Correlogram/相关图
plot.phylo == Plot Phylogenies
plot.phylo.extra == Extra Fuctions to Plot and Annotate Phylogenies
plot.varcomp == Plot Variance Components
plotTreeTime == Plot Tree With Time Axis
print.phylo == Compact/密集 Display of a Phylogeny
rDNAbin == Random DNA Sequences
rTraitCont == Continuous Character Simulation
rTraitDisc == Discrete Character Simulation
rTraitMult == Multivariate/多变 Character Simulation
read.GenBank == Read DNA Sequences from GenBank via Internet
read.caic == Read Tree File in CAIC Format
read.dna == Read DNA Sequences in a File
read.gff == Read GFF Files
read.nexus == Read Tree File in Nexus Format
read.nexus.data == Read Character Data In NEXUS Format
read.tree == Read Tree File in Parenthetic Format
reconstruct == Continuous Ancestral Character Estimation (连续祖先特征估计)
reorder.phylo == Internal Reordering (内部重排) of Trees
richness.yule.test == Test of Diversification-Shift With the Yule Process
rlineage == Tree Simulation Under the Time-Dependent Birth-Death Models
root == Roots Phylogenetic Trees
rotate == Swapping/交换 Sister Clades
rtree == Generate Random Trees
rtt == Root a Tree by Root-to-Tip Regression
seg.sites == Find Segregating/分隔 Sites in DNA Sequences
skyline == Skyline Plot Estimate of Effective Population Size
skylineplot == Drawing Skyline Plot Graphs
slowinskiguyer.test == Slowinski-Guyer Test of Homogeneous Diversification
solveAmbiguousBases == Solve Ambiguous Bases in DNA Sequences
speciesTree == Species Tree Estimation
stree == Generates Systematic Regular Trees (系统规则树)
subtreeplot == Zoom on a Portion of a Phylogeny by Successive/连续 Clicks
subtrees == All subtrees of a Phylogenetic Tree
summary.phylo == Print Summary of a Phylogeny
trans == Translation from DNA to Amino Acid Sequences
treePop == Tree Popping/爆开
trex == Tree Explorer With Multiple Devices
triangMtd == Tree Reconstruction Based on the Triangles/三角 Method
unique.multiPhylo == Revomes Duplicate Trees
updateLabel == Update Labels
varCompPhylip == Variance Components with Orthonormal Contrasts
varcomp == Compute Variance Component Estimates
vcv == Phylogenetic Variance-covariance or Correlation Matrix
vcv2phylo == Variance-Covariance Matrix to Tree
weight.taxo == Define Similarity Matrix
where == Find Patterns in DNA Sequences
which.edge == Identifies Edges of a Tree
woodmouse == Cytochrome b Gene Sequences of Woodmice
write.dna == Write DNA Sequences in a File
write.nexus == Write Tree File in Nexus Format
write.nexus.data == Write Character Data in NEXUS Format
write.tree == Write Tree File in Parenthetic Format
yule == Fits the Yule Model to a Phylogenetic Tree
yule.cov == Fits the Yule Model With Covariates
yule.time == Fits the Time-Dependent Yule Model
zoom == Zoom on a Portion of a Phylogeny
(附) 实战及示例数据简要测试
R包ape
ape包用于系统发育和进化分析,提供了以下功能:
1.读、写、操作、分析、模拟系统发育树及DNA序列;2.计算DNA距离;3.翻译为AA(氨基酸?)序列;4.基于距离法估计(进化)树;5.比较及多样性分析;6.自编程新的系统发育方法。
文档:用library(help = ape)
显示完整函数列表;案例代码 x:/Program Files/R/R-x.x.x/library/ape/doc;更多信息 https://cran.r-project.org/web/packages/ape/ape.pdf DOI: 10.1093/bioinformatics/btg412 doi: 10.1093/bioinformatics/bty633 https://hal.ird.fr/ird-01887318 http://ape-package.ird.fr/, https://github.com/emmanuelparadis/ape http://ape-package.ird.fr/ 等
library(ape)
# library(help = ape)
ape包的read.dna函数:
从文件中读取DNA序列,返回1个矩阵或DNA序列列表,分别将读取的分类群(Taxa)的名称作为rowname
或names
序列以二进制格式返回(默认),或小写字母(选项as.character = TRUE
)
# 读取序列
ex.dna <- ape::read.dna("./result/pairsnp/core.Temporal.aln.fasta", format = "fasta", as.character=F)
ex.dna.sub <- ape::read.dna("./result/pairsnp/core.Temporal.aln.noREF.fasta", format = "fasta", as.character=F)
# 选项format:"fasta", "sequential", "clustal", "interleaved"(序列名含空格时), 或任何明确的缩写
# 查看
str(ex.dna)
## 'DNAbin' raw [1:22, 1:984] g g g g ...
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:22] "ERR2245277" "ERR2245279" "ERR2245288" "ERR2245293" ...
## ..$ : NULL
ex.dna[1:5,1:5,drop=F]
## 5 DNA sequences in binary format stored in a matrix.
##
## All sequences of same length: 5
##
## Labels:
## ERR2245277
## ERR2245279
## ERR2245288
## ERR2245293
## ERR2245294
##
## Base composition:
## a c g t
## 0.0 0.6 0.4 0.0
## (Total: 25 bases)
alview(ex.dna, file = "", uppercase = TRUE, showpos = TRUE)
## 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111122222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222223333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444455555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555556666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777788888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888889999999999999999999999999999999999999999999999999999999999999999999999999999999999999
## 000000000111111111122222222223333333333444444444455555555556666666666777777777788888888889999999999000000000011111111112222222222333333333344444444445555555555666666666677777777778888888888999999999900000000001111111111222222222233333333334444444444555555555566666666667777777777888888888899999999990000000000111111111122222222223333333333444444444455555555556666666666777777777788888888889999999999000000000011111111112222222222333333333344444444445555555555666666666677777777778888888888999999999900000000001111111111222222222233333333334444444444555555555566666666667777777777888888888899999999990000000000111111111122222222223333333333444444444455555555556666666666777777777788888888889999999999000000000011111111112222222222333333333344444444445555555555666666666677777777778888888888999999999900000000001111111111222222222233333333334444444444555555555566666666667777777777888888888899999999990000000000111111111122222222223333333333444444444455555555556666666666777777777788888
## 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234
## ERR2245277 GCCCGTCGCTGGTGGGTAATAATGAAGAGCGTGGTGCAGTCTAAGGGCGTATGGACCCTTGCGCCCACTCTACCGTGCGGACGTCTCACTCTATATGCTCGACACCGTTCGTTCGACTCGCCCCCGTAGGCTCGGCCACAAGTGCCGTCGCAGGATTCAGCTCGAGCGCCCGAGCAGATCCAACCCCTTCTTCCTTGGGGTTCCTCGGTTGCTCGAATTCGCTACGCCCCCAGGGCAGATCCAGCGCTCGCAGGGTAACGGTCGTACCAGGGGTAAGTGGCACCAAGCAGCAGGGTCCGGGACCTGCACGCCGGACCGGCGCTACCTCGCTTCTGCGTTGGTCGTCTCATTGATCGCCGTTCCACGTAACCCAGCTATGCTAGCAGACTCACAAACTGGGCGACATCGATACCCCCAGACTTTTCACTGGCTCAGATGTCAAGCACGGACTGGTGACTCAGACTACCATTTTTTTGGCTCCTTGGTGTCCTAACCGCGCACGAAGCCGGGCTTACTGACACATGGTGCGCAGCTACGACGCAAGAAACATCGGAACCGCGCGCCTACGCGTAAAGAAGGTGGGGAAGGCCGTTCAGGCGTGGCTGTGACGCCATAAGAGGTCCAGGCGCAGCATGGGCCCTGCGTACAGAGGACCCCAAAAAAGAAAGGTGTAAGCCCCTCGTGATTGGCGTGTTCACAGCAACCGGGAGGGCGTACCGTCGGGACTCCAAACATCCCTGGATGTTCGTGTGCGCCCGGGAGTGCCCCTAGCGGCGCATAGTCCTCGCGGCTCAGTCAATTTCACTTCCTTGCATCTATGGGGGTCCCGCATTTTTGGTTAGCCGCGCGAGGGTCACCCAACGTTTGATATTGCCGTAGCTAGTGAGTGATGTGCCTTCCCCTGTCAACGATGCGAATCGTACGTAGGGGCGGGGTGCTGCTAGTAGGGAACATTGAAAAAAAATGCAATGTACTGGCCAAAGC
## ERR2245279 ...........................................................C.....................................................................................................................G.............................................G..............................................................................................................................................................C.....................................................................................C...................................................G........................................................................................................A......................................G........G...............................A............A..................................................G...........................................................G...............A..............................................C...........................................................................................
## ERR2245288 ........A.....................AC......A...G..........T.....C...G.......................G.............G...............C...............A...G...................T...................G.......G.G......................C............G...T..............G.......................T................................AC...............T...............T.......................A.A...A.......T...........C.......................G......A.....A................................A.C.............C..............................G....................G...........................C................G...............A.............A.............A..............CA..G.............................G.....GG.....C.G...TT.....G.............G......A...A........A......T...........................................G..T.T.......................G...........C....T..C..........G...............A.........G..A.......A..G..C...................C.......G..C.T..................................................G..........C................
## ERR2245293 ........A......................C...........................C...GT...........T..........G.............G....A..........C...............A...G..................G....................G.........G......................C............GT............A....G.......................T..A.............................AC...............................T.......A...............A.A..TA.......T...........C.......................G............A..................................C.............C......A.......................G....................G..A........................C................G...............................TA..........A...............AA.G.............................G.....GG.....C.G...TT.....G.............G....T.A...A........A..............G................................T..G..T.T..G..............T.....G...........C....T..C......T...G.......A...A...AA........G.............G..C.....A.....T.......C.......G..C.T...........................A......................G..........C................
## ERR2245294 ...............................C...........................C...G.....................................G...........T...............................................................G.....................T.......................G.A..............T..............................................................................................T......................A.......................C.....................C................................................C..............C...........T.......................................G...........................................C............................................................A......................................G......C.G..........G............TG......A............A...............C..................................G................................................C..........G...............A..........................C...................C.......G.............................................A....................C................
## ERR2245295 ...........................................................C.....................................................................................................................G.............................................G................................................................................................................................................C.............C.....................................................................................C...................................................G........................................................................................................A......................................G........G...............................A............A..................................................G...........................................................G...............A..............................................C...........................................................................................
## ERR2245344 ..........................AC.T.C.........................T.C...G.....................................G.......T.......C........................................G...........T......G..G......G.T.................................G............................T.............T..................G..........................TT.......T.......T...................C....A...A.......................C..............A....G.................................C.................C.............C..............................G....................G...............A..........T.......................................T.....................................A....T..................G..............G......C.G....T.....G.............G......A.......A....A..................................................G.G.................A............................C..........G...............A......................CG..C......T............C.......G..C...................................A.A..............G..........C................
## ERR2245378 ...............................C...........................C.....................................................................................................................G.............................................G......................................................................................................................................A.....G.................C.............................T.......................................................C...................................................G......................A.......T.......A.................................................................A......................................G........G...............................A......G.....A..................................................G....................T................T.....................G...............A..........................C...................C.........................................................................................C.
## ERR2245380 ........................G......C.C.........................C...G..................................G..G...............C.........................A...........................A.....G.........G...................................G.................AG.......T...............T......................A...................................T................................A...A..........A........C.......G.........G..................A....T.........T...................C.............C.................A..........A.G....................G...................T...............................................A....................................A..G...................................GG.....C.G....T.....G......G......G......A............A..................................................G..T.....................A.A.............C.......C..........G...............A...................G...G..C...................C.......G..C........A...........................................G...C......C................
## ERR2245386 ...............................C...........................CA..G.................T...................G...........................................................................G.............................A...............G..............................................A............G..........................................................................A.......................C..........................G................T.........................................C...................................................G..............................................A.....................................................T...A......................................G......C.G..........G...A.........G......A.A..........A..................................................G................................................C....A.....G...............A...............G..........C...................C.......G............................A..A..................................C................
## ERR2245388 ...........................................................C.....................................................................................................................G.............................................G..............................................................................................................................................................C.....................................................................................C...................................................G........................................................................................................A......................................G........G...............................A............A..................................................G...........................................................G...............A..............................................C...........................................................................................
## ERR2245393 T.......A......................C...........................C...G....................T..G.............G...............C...............A...G...................T...................G.......G.G......................C.......G....G..................G.................A.....T................................AC...............................T.......................A.A...A.......T...........C.......................G......A.....A.................A..............A.C.............C..............................G....................G...........................C................G...T..........T............................A.........A.....A..G..........................A..G.....GG.....C.G...TT.....G.............G......A...A........A........T..............G..........................G..T.T.......................G...........C....T..C..........G..C............A.........G.............G..C..................AC.......G..C.T..................................................G..........C........A.......
## ERR2245400 ...............................C...........................C.....................................................................................................................G.............................................G......................................................................................................................................A.....G.................C.............................T......................................A................C.................................................T.G..............................T.......A.................................................................A......................................G........G...............................A......G.....A..................................................G....................T................T.....................G...............A..........................C...................C.........................................................................................C.
## ERR2245401 ........................G......C.C.........................C...G..................................G..G...............C.........................A...........................A.....G.........G...................................G.................AG.......................T......................A....................................................................A...A...................C.......G.........G..................A..............T...................C..........T..C............................A.G....................G..............................................................T.........................................A..G......................T............GG.....C.G....T.....G......G......G......A............A..................................................G..T.....................................C.......C..........G...............A.......................G..C...................C.......G..C........A...........................................G..........C................
## ERR2245402 .....C.........................CA..........................C...G...............A.....................G...............................................................A...........G.............................................G.........................................................................A.........................................C...............A..A.......................CA......................................................G.............................C..........T........A...............................G.....................................A..................................................................A.....T................................G...T..C.G..........G.............G......A............A...........................T....C...........A.....G......G.....A...................................C...A.C....G...............A..........................C...................C.......G..................................................................C................
## ERR2245406 ........................G......C.C.........................C...G..................................G..G...............C.........................A...........................A.....G.........G...................................G.................AG.......................T......................A....................................................................A...A...................C.......G.........G..................A..............T...................C..........T..C............................A.G....................G..............................................................T..........................T..............A..G......................T............GG.....C.G....T.....G......G......G......A............A..................................................G..T.....................................C.......C..........G...............A.......................G..C...................C.......G..C........A...........................................G..........C................
## ERR2245411 .............................T.C.........................T.C...G.........T...............C...........G.......T.......C...........................................................G..G......G.T.................................G............................T.............T..................G...................T.......T.......T.....T.....................C....A...A.......................C...........G.......G.................................C.................C.............C..............................G....................G..........................T.............................................................................A....T.................................G......C.G....T.....G.............G.....TA............A........................A........G................G.G.................A.............G..............C..........G...............A......................CG..C...................C.......G..C...................................A................G..........C................
## ERR2245418 .............................T.C.........................T.C...G.........T...............C...........G.......T.......C...........................................................G..G......G.T.................................G............................T.............T..................G...................T.......T.......T...........................C....A...A.......................C...........G.......G.................................C.................C.............C..............................G....................G..........................T.............................................................................A....T.................................G......C.G....T.....G.............G.....TA............A........................A........G................G.G.................A.............G..............C..........G...............A......................CG..C...................C.......G..C...................................A................G..........C................
## ERR2245420 ........................G......C.C.........................C...G..................................G..G...............C.........................A...........................A.....G.........G...................................G.................AG.......................T......................A....................................................................A...A...................C.......G.........G..................A..............T...................C..........T..C............................A.G....................G..............................................................T.........................................A..G......................T............GG.....C.G....T.....G......G......G......A............A..................................................G..T.....................................C.......C..........G...............A.......................G..C...................C.......G..C........A...........................................G..........C................
## ERR2245425 .............................T.C.........................T.C...G.........T...............C...........G.......T.......C...........................................................G..G......G.T.................................G............................T.............T..................G...................T.......T.......T...........................C....A...A.......................C...........G.......G.................................C.................C.............C..............................G....................G..........................T.............................................................................A....T.................................G......C.G....T.....G.............G.....TA............A........................A........G................G.G.................A.............G..............C..........G...............A......................CG..C...................C.......G..C...................................A................G..........C................
## ERR2512377 ....................C......................................C...........C..C.............................T..............................................G.........................G.............................................G..................G.T.............................G..CA..............................A................................T.......C........C....................C.C.....C................C............G.........................G.....................T.C...............................................T...G...A.............G.........C................................................................C...........A................A...........T.........G........G...............................A............A..................................................G.............T.............................................G...............A..............................................C................A..........C............................G.........G....................C...
## Reference .TGGA.TC.GCCCCAAGGGCCGGA.C..T..C..CATG.CTC.GCCATACGGT.GTT.C..TCG.TGTAACCG.CC.TC.C.AG.CTGG.GCCCGGCA.TAGTGTT.CG.ACC.TGACGCGAAGTTGGTCACG.CGT.TCCCG.TAACTCTGTTGC..GTTCTAG.TCTT..GATCA.CT.CGTG.TGC.CCGTCCCAA.GGTTGTC.CC.TGTCCGG.TATC...G.GGAGCCATG.GC..GCTATCTA.C.ATCTGTT.CTAGCTGC..TACGTACAATCT.T.AAG.ACAAACT.A.CGGTC.TGGATA..GG.AATT.CCT.C.T.CC.CC.ACC..CTCCACAG.CACG...TACC.AC.TCG.T.AG.TCGCCGCC..CAGTCA.TGG.TCAAA.A.T.C.AG.GG..TTGCGAGCAC.C.CCCTCGC.G...GGGATGGATGGC...CTTCAGTGTCG.TG.GCACCC.AAC..CCCTC.C.AGCGTAAA.GGAACCTGTACTACCCTCTG.G.GC.AGATAGGCGCGG.CGC.GG.GGC.CGG.CGGAGC..GATG..T.T.CGGCAGCAAC..A.CCA..TACCGG.T..CACTCTCCGA.TTGCCCAGA.C.TC...AGG..CCAAATGGCAAACGTGC.T.TTT.GGGTGGGT.GCC.GACG.ATA.TCTACAGCCA.TC.ACAGG.GCGGG...A.C.TA..GTGA.ATAAAC.C.AGGCA..AGGCCAG..CC.CCAC..TGTGCACGAC.GG.AC.A.TC.A..GGAC.TCTAAT..G.G.C.GGCGCG.GGC.GCCTTGGTCGCAAT....GAAGCG.ACCA.CCG.TGA..TAGCTACG.TT.GG.ACC..G.GCCCGTACG..CGTC.GAGAGG..CTTCGTTGTCCG.GGGACC.TAGGGGCCGTAACCAA.GA..CCATC.T.GACG.CAGGTGCCAGCGG.GGGGCACGGCCCTGC.TGGCC.G
image(ex.dna[, 1:100], cex.lab = 0.5, cex.axis = 0.7)
checkAlignment(ex.dna, check.gaps = TRUE, plot = TRUE, what = 1:4)
##
## Number of sequences: 22
## Number of sites: 984
##
## No gap in alignment.
##
## Number of segregating sites (including gaps): 984
## Number of sites with at least one substitution: 984
## Number of sites with 1, 2, 3 or 4 observed bases:
## 1 2 3 4
## 0 984 0 0
由fasta到DNAbin的另1个可选方法
# adegenet包的fasta2DNAbin函数:读取fasta格式的(序列)对齐,扩展名 .fasta .fa .fas
# 输出DNAbin对象 (来自ape包的高效DNA表示),含完整的序列对齐(Alignment) 或 SNP(参数`snpOnly`)
# 此函数可提高内存效率,并且可以读取比Ape包的read.dna函数更大的数据集
ex.dna.f2d <- adegenet::fasta2DNAbin("./result/pairsnp/core.Temporal.aln.fasta", quiet=FALSE, chunkSize=10, snpOnly=FALSE)
##
## Converting FASTA alignment into a DNAbin object...
##
##
## Finding the size of a single genome...
##
##
## genome size is: 984 nucleotides
##
## ( 2 lines per genome )
##
## Importing sequences...
## ............................................................
## Forming final object...
##
## ...done.
str(ex.dna.f2d) # DNA sequences in binary format stored in a matrix
## 'DNAbin' raw [1:22, 1:984] g g g g ...
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:22] "ERR2245277" "ERR2245279" "ERR2245288" "ERR2245293" ...
## ..$ : NULL
identical(ex.dna.f2d, ex.dna)
## [1] TRUE
all.equal.DNAbin(ex.dna.f2d, ex.dna, plot=T)
## [1] TRUE
identical(ex.dna.sub, ex.dna)
## [1] FALSE
all.equal.DNAbin(ex.dna.sub, ex.dna, plot=T)
## [1] "Number of sequences different:"
## [2] "21 sequences in 1st object; 22 sequences in 2nd object."
## [3] "Subset your data for further comparison."
ape的更多函数:
可在文档(ape.pdf)中搜索更多的应用,e.g. 搜索”DNAbin” (查看能分析DNAbin的所有函数) https://cran.r-project.org/web/packages/ape/ape.pdf
# 其它函数:read.fastq可下载、读取Fastq;read.GenBank;read.gff;as.DNAbin;dist.dna;clustal/muscle(多序列比对)
# dnds dN/dS Ratio
# dnds(ex.dna, code = 1, codonstart = 1, quiet = FALSE, details = FALSE, return.categories = FALSE)
# Error: sequences are not unique
# data(woodmouse) # Woodmice的细胞色素b的基因序列
# res <- dnds(woodmouse, quiet = TRUE) # NOT correct
# 多序列比对图
# image.DNAbin Plot of DNA Sequence Alignement
# image(woodmouse)
# rug(seg.sites(woodmouse), -0.02, 3, 1)
# image(woodmouse, "n", "blue") # show missing data
# image(woodmouse, c("g", "c"), "green") # G+C
# par(mfcol = c(2, 2))
# ### barcoding style:
# for (x in c("a", "g", "c", "t"))
# image(woodmouse, x, "black", cex.lab = 0.5, cex.axis = 0.7)
# par(mfcol = c(1, 1))
# ### zoom on a portion of the data:
# image(woodmouse[11:15, 1:50], c("a", "n"), c("blue", "grey"))
# grid(50, 5, col = "black")
# ### see the guanines on a black background:
# image(woodmouse, "g", "yellow", "black")
# 多序列比对图 注释到 进化树上
# phydataplot(x, phy, style = "bars", offset = 1, scaling = 1, continuous = FALSE, width = NULL, legend = "below",
# funcol = rainbow, ...)
# ring(x, phy, style = "ring", offset = 1, ...)
# If you want to plot a DNA alignment in the same way than image.DNAbin, try style = "mosaic".
## plot an alignment with a NJ tree:
data(woodmouse)
trw <- nj(dist.dna(woodmouse))
plot(trw, x.lim = 0.1, align.tip = TRUE, font = 1)
phydataplot(woodmouse[, 1:100], trw, "m", 0.02, border = NA)
# (序列)标签管理:character、DNAbin、phylo、multiPhylo
# makeLabel(x, ...)
# ## S3 method for class 'character'
# makeLabel(x, len = 99, space = "_", make.unique = TRUE,
# illegal = "():;,[]", quote = FALSE, ...)
# ## S3 method for class 'phylo'
# makeLabel(x, tips = TRUE, nodes = TRUE, ...)
# ## S3 method for class 'multiPhylo'
# makeLabel(x, tips = TRUE, nodes = TRUE, ...)
# ## S3 method for class 'DNAbin'
# makeLabel(x, ...)
# updateLabel - Update Labels
# where - Find Patterns in DNA Sequences
# nexus2DNAbin - Convert the output from the previous function into the class "DNAbin".
# write.nexus.data - Write Character Data in NEXUS Format
# 获得 class ‘phylo’
# tree1 <- read.tree(‘tree1.txt’)
# This stores the phylogenetic tree is in an object named tree1 of class ‘phylo’.
# write.tree Write Tree File in Parenthetic Format
trw <- bionj(dist.dna(ex.dna, model = "raw"))
plot(trw)
str(trw)
## List of 4
## $ edge : int [1:41, 1:2] 23 25 26 37 37 26 25 27 28 32 ...
## $ edge.length: num [1:41] 6.78e-05 1.76e-04 1.95e-02 7.50e-01 1.42e-02 ...
## $ tip.label : chr [1:22] "ERR2245277" "ERR2245279" "ERR2245288" "ERR2245293" ...
## $ Nnode : int 20
## - attr(*, "class")= chr "phylo"
## - attr(*, "order")= chr "cladewise"
write.tree(trw, file = "./tree.txt", append = FALSE, digits = 10, tree.names = FALSE)
# phy - an object of class "phylo" or "multiPhylo".
# mcmc.popsize函数 - 可逆跳跃MCMC (马尔可夫链蒙特卡罗) 推断群体数量史 (Demographic History)
# get tree
data("hivtree.newick") # example tree in NH format
tree.hiv <- read.tree(text = hivtree.newick) # load tree
# run mcmc chain
mcmc.out <- mcmc.popsize(tree.hiv, nstep=100, thinning=1, burn.in=0,progress.bar=FALSE) # toy run
#mcmc.out <- mcmc.popsize(tree.hiv, nstep=10000, thinning=5, burn.in=500) # remove comments!!
# make list of population size versus time
popsize <- extract.popsize(mcmc.out)
# plot and compare with skyline plot
sk <- skyline(tree.hiv)
plot(sk, lwd=1, lty=3, show.years=TRUE, subst.rate=0.0023, present.year = 1997)
lines(popsize, show.years=TRUE, subst.rate=0.0023, present.year = 1997)
# skyline函数 - 使用天际线图估计有效种群大小(Effective Population Size)
# get tree
data("hivtree.newick") # example tree in NH format
tree.hiv <- read.tree(text = hivtree.newick) # load tree
# corresponding coalescent intervals
ci <- coalescent.intervals(tree.hiv) # from tree
# collapsed intervals
cl1 <- collapsed.intervals(ci,0)
cl2 <- collapsed.intervals(ci,0.0119)
#### classic skyline plot ####
sk1 <- skyline(cl1) # from collapsed intervals
sk1 <- skyline(ci) # from coalescent intervals
sk1 <- skyline(tree.hiv) # from tree
sk1
## $time
## [1] 0.021161 0.049822 0.050058 0.054444 0.057580 0.057620 0.058582 0.059336
## [9] 0.059726 0.061847 0.063012 0.065150 0.065542 0.067767 0.068246 0.068285
## [17] 0.069341 0.069631 0.069725 0.070130 0.070506 0.071398 0.077168 0.077774
## [25] 0.078085 0.078439 0.078488 0.079791 0.079902 0.080093 0.080206 0.080907
## [33] 0.080919 0.080923 0.081266 0.081732 0.082433 0.082434 0.082720 0.083124
## [41] 0.083172 0.083436 0.083818 0.084541 0.084566 0.084629 0.085711 0.086004
## [49] 0.087462 0.087519 0.087787 0.088609 0.088644 0.088785 0.088823 0.089216
## [57] 0.089479 0.089540 0.090237 0.090248 0.090579 0.090777 0.091414 0.091575
## [65] 0.092297 0.092865 0.092881 0.093773 0.094006 0.094130 0.094174 0.094384
## [73] 0.094521 0.094683 0.094749 0.095068 0.095107 0.095333 0.095334 0.095369
## [81] 0.096163 0.096270 0.096432 0.096562 0.097150 0.097278 0.097341 0.097342
## [89] 0.099386 0.099467 0.099582 0.099726 0.099948 0.101084 0.101878 0.101892
## [97] 0.101961 0.101962 0.102399 0.102590 0.103060 0.103714 0.103878 0.103926
## [105] 0.104386 0.104484 0.104691 0.105017 0.105018 0.105020 0.105181 0.106054
## [113] 0.106195 0.107063 0.107287 0.107629 0.108329 0.108551 0.108702 0.109333
## [121] 0.109996 0.111546 0.111869 0.112580 0.112666 0.113317 0.113663 0.113664
## [129] 0.113731 0.113986 0.114132 0.114392 0.114561 0.114632 0.114774 0.116003
## [137] 0.116109 0.117194 0.117310 0.117514 0.117728 0.118538 0.118650 0.118782
## [145] 0.118990 0.119083 0.120265 0.120408 0.120491 0.120494 0.122054 0.122422
## [153] 0.122572 0.123109 0.124283 0.125240 0.127902 0.127903 0.127904 0.127999
## [161] 0.129007 0.130366 0.130367 0.130470 0.131217 0.131462 0.132291 0.135095
## [169] 0.135794 0.135795 0.135914 0.140307 0.143055 0.144389 0.146425 0.148331
## [177] 0.150189 0.151362 0.155704 0.164770 0.166301 0.166846 0.176701 0.178108
## [185] 0.179167 0.181334 0.192334 0.196999 0.197000 0.204899 0.204900 0.209112
##
## $interval.length
## [1] 0.021161 0.028661 0.000236 0.004386 0.003136 0.000040 0.000962 0.000754
## [9] 0.000390 0.002121 0.001165 0.002138 0.000392 0.002225 0.000479 0.000039
## [17] 0.001056 0.000290 0.000094 0.000405 0.000376 0.000892 0.005770 0.000606
## [25] 0.000311 0.000354 0.000049 0.001303 0.000111 0.000191 0.000113 0.000701
## [33] 0.000012 0.000004 0.000343 0.000466 0.000701 0.000001 0.000286 0.000404
## [41] 0.000048 0.000264 0.000382 0.000723 0.000025 0.000063 0.001082 0.000293
## [49] 0.001458 0.000057 0.000268 0.000822 0.000035 0.000141 0.000038 0.000393
## [57] 0.000263 0.000061 0.000697 0.000011 0.000331 0.000198 0.000637 0.000161
## [65] 0.000722 0.000568 0.000016 0.000892 0.000233 0.000124 0.000044 0.000210
## [73] 0.000137 0.000162 0.000066 0.000319 0.000039 0.000226 0.000001 0.000035
## [81] 0.000794 0.000107 0.000162 0.000130 0.000588 0.000128 0.000063 0.000001
## [89] 0.002044 0.000081 0.000115 0.000144 0.000222 0.001136 0.000794 0.000014
## [97] 0.000069 0.000001 0.000437 0.000191 0.000470 0.000654 0.000164 0.000048
## [105] 0.000460 0.000098 0.000207 0.000326 0.000001 0.000002 0.000161 0.000873
## [113] 0.000141 0.000868 0.000224 0.000342 0.000700 0.000222 0.000151 0.000631
## [121] 0.000663 0.001550 0.000323 0.000711 0.000086 0.000651 0.000346 0.000001
## [129] 0.000067 0.000255 0.000146 0.000260 0.000169 0.000071 0.000142 0.001229
## [137] 0.000106 0.001085 0.000116 0.000204 0.000214 0.000810 0.000112 0.000132
## [145] 0.000208 0.000093 0.001182 0.000143 0.000083 0.000003 0.001560 0.000368
## [153] 0.000150 0.000537 0.001174 0.000957 0.002662 0.000001 0.000001 0.000095
## [161] 0.001008 0.001359 0.000001 0.000103 0.000747 0.000245 0.000829 0.002804
## [169] 0.000699 0.000001 0.000119 0.004393 0.002748 0.001334 0.002036 0.001906
## [177] 0.001858 0.001173 0.004342 0.009066 0.001531 0.000545 0.009855 0.001407
## [185] 0.001059 0.002167 0.011000 0.004665 0.000001 0.007899 0.000001 0.004212
##
## $population.size
## [1] 392.071008 525.528096 4.282220 78.750630 55.714176 0.703120
## [7] 16.730142 12.972570 6.637800 35.709156 19.400745 35.214998
## [13] 6.385680 35.844750 7.630949 0.614367 16.448256 4.466000
## [19] 1.431150 6.095655 5.594128 13.117752 83.866950 8.705190
## [25] 4.414956 4.965912 0.679189 17.844585 1.501830 2.552906
## [31] 1.491939 9.141741 0.154560 0.050880 4.308423 5.779798
## [37] 8.584446 0.012090 3.413410 4.759524 0.558144 3.029664
## [43] 4.326150 8.079525 0.275650 0.685314 11.610942 3.101405
## [49] 15.221520 0.586872 2.721004 8.229042 0.345450 1.371930
## [55] 0.364458 3.715029 2.450108 0.559980 6.304365 0.098021
## [61] 2.905518 1.711908 5.424055 1.349985 5.960832 4.616704
## [67] 0.128016 7.024500 1.805750 0.945624 0.330132 1.550010
## [73] 0.994620 1.156680 0.463386 2.202057 0.264654 1.507420
## [79] 0.006555 0.225435 5.024432 0.665112 0.989010 0.779350
## [85] 3.460968 0.739584 0.357273 0.005565 11.160240 0.433836
## [91] 0.604095 0.741744 1.121100 5.623200 3.851694 0.066542
## [97] 0.321264 0.004560 1.951205 0.834861 2.010660 2.737644
## [103] 0.671580 0.192240 1.801360 0.375144 0.774387 1.191530
## [109] 0.003570 0.006972 0.547883 2.899233 0.456840 2.742880
## [115] 0.690144 1.027026 2.048200 0.632700 0.419025 1.704331
## [121] 1.742364 3.961800 0.802655 1.717065 0.201756 1.482978
## [127] 0.765006 0.002145 0.139360 0.514080 0.285138 0.491660
## [133] 0.309270 0.125670 0.242962 2.031537 0.169176 1.670900
## [139] 0.172260 0.291924 0.294892 1.074060 0.142800 0.161700
## [145] 0.244608 0.104904 1.277742 0.148005 0.082170 0.002838
## [151] 1.408680 0.316848 0.123000 0.418860 0.869934 0.672771
## [157] 1.772892 0.000630 0.000595 0.053295 0.532224 0.674064
## [163] 0.000465 0.044805 0.303282 0.092610 0.290979 0.911300
## [169] 0.209700 0.000276 0.030107 1.014783 0.577080 0.253460
## [175] 0.348156 0.291618 0.252688 0.140760 0.455910 0.825006
## [181] 0.119418 0.035970 0.542025 0.063315 0.038124 0.060676
## [187] 0.231000 0.069975 0.000010 0.047394 0.000003 0.004212
##
## $parameter.count
## [1] 192
##
## $epsilon
## [1] 0
##
## $logL
## [1] 1408.966
##
## $logL.AICc
## [1] NA
##
## attr(,"class")
## [1] "skyline"
plot(skyline(tree.hiv))
skylineplot(tree.hiv) # shortcut
plot(sk1, show.years=TRUE, subst.rate=0.0023, present.year = 1997)
#### generalized skyline plot ####
sk2 <- skyline(cl2) # from collapsed intervals
sk2 <- skyline(ci, 0.0119) # from coalescent intervals
sk2 <- skyline(tree.hiv, 0.0119) # from tree
sk2
## $time
## [1] 0.021161 0.049822 0.061847 0.077168 0.089216 0.101878 0.113986 0.127902
## [9] 0.140307 0.155704 0.176701 0.192334 0.209112
##
## $interval.length
## [1] 0.021161 0.028661 0.012025 0.015321 0.012048 0.012662 0.012108 0.013916
## [9] 0.012405 0.015397 0.020997 0.015633 0.016778
##
## $population.size
## [1] 392.07100800 525.52809600 26.43747675 18.16241385 4.32071145
## [6] 2.19342354 1.06974257 0.55211856 0.27727433 0.33138171
## [11] 0.38060475 0.09827875 0.02431880
##
## $parameter.count
## [1] 13
##
## $epsilon
## [1] 0.0119
##
## $logL
## [1] 1239.478
##
## $logL.AICc
## [1] 1225.456
##
## attr(,"class")
## [1] "skyline"
plot(sk2)
# classic and generalized skyline plot together in one plot
plot(sk1, show.years=TRUE, subst.rate=0.0023, present.year = 1997, col=c(grey(.8),1))
lines(sk2, show.years=TRUE, subst.rate=0.0023, present.year = 1997)
legend(.15,500, c("classic", "generalized"), col=c(grey(.8),1),lty=1)
# find optimal epsilon parameter using AICc criterion
find.skyline.epsilon(ci)
## Searching for the optimal epsilon... epsilon = 0.01191938
## [1] 0.01191938
sk3 <- skyline(ci, -1) # negative epsilon also triggers estimation of epsilon
## Searching for the optimal epsilon... epsilon = 0.01191938
sk3$epsilon
## [1] 0.01191938
# skylineplot函数 - 绘制天际线图
# get tree
data("hivtree.newick") # example tree in NH format
tree.hiv <- read.tree(text = hivtree.newick) # load tree
#### classic skyline plot
skylineplot(tree.hiv) # shortcut
#### plot classic and generalized skyline plots and estimate epsilon
sk.opt <- skylineplot.deluxe(tree.hiv)
## Searching for the optimal epsilon... epsilon = 0.01191938
sk.opt$epsilon
## [1] 0.01191938
#### classic and generalized skyline plot ####
sk1 <- skyline(tree.hiv)
sk2 <- skyline(tree.hiv, 0.0119)
# use years rather than substitutions as unit for the time axis
plot(sk1, show.years=TRUE, subst.rate=0.0023, present.year = 1997, col=c(grey(.8),1))
lines(sk2, show.years=TRUE, subst.rate=0.0023, present.year = 1997)
legend(.15,500, c("classic", "generalized"), col=c(grey(.8),1),lty=1)
#### various skyline plots for different epsilons
layout(mat= matrix(1:6,2,3,byrow=TRUE))
ci <- coalescent.intervals(tree.hiv)
plot(skyline(ci, 0.0));title(main="0.0")
plot(skyline(ci, 0.007));title(main="0.007")
plot(skyline(ci, 0.0119),col=4);title(main="0.0119")
plot(skyline(ci, 0.02));title(main="0.02")
plot(skyline(ci, 0.05));title(main="0.05")
plot(skyline(ci, 0.1));title(main="0.1")
layout(mat= matrix(1:1,1,1,byrow=TRUE))
# 其它有关R包:Felsenstein, J. (1993) Phylip (Phylogeny Inference Package) version 3.5c. http://evolution.genetics.washington.edu/phylip/phylip.html
ape包的dist.dna
函数:计算DNA序列成对距离 (Pairwise Distances from DNA Sequences)
输入含DNA序列的矩阵或列表(List),类(Class)必须是DNAbin。若DNA序列被存储为字符(Character),可使用as.DNAbin
函数来转换
dna.bin.dist = ape::dist.dna(ex.dna, model = "N", variance = FALSE, gamma = FALSE, pairwise.deletion = FALSE, base.freq = NULL, as.matrix = F)
dna.bin.dist.sub = ape::dist.dna(ex.dna.sub, model = "N", variance = FALSE, gamma = FALSE, pairwise.deletion = FALSE, base.freq = NULL, as.matrix = F)
dna.bin.dist.f2d = ape::dist.dna(ex.dna.f2d, model = "N", variance = FALSE, gamma = FALSE, pairwise.deletion = FALSE, base.freq = NULL, as.matrix = F)
# as.matrix:对角线或方矩阵
# model:"raw", "N", "TS", "TV", "JC69", "K80" (the default), "F81", "K81", "F84", "BH87", "T92", "TN93", "GG95", "logdet", "paralin", "indel", or "indelblock"
# raw, N: 每对序列之间差异位点的比例,或数量。有助于绘制“饱和图(Saturation plots)”。选项variance(方差)和gamma对其没有影响,但pairwise.deletion可以
# 故model为"N"时,返回整数 (SNP距离绝对值)
str(dna.bin.dist) # 'dist'
## 'dist' num [1:231] 15 80 90 38 16 67 27 66 41 15 ...
## - attr(*, "Size")= int 22
## - attr(*, "Labels")= chr [1:22] "ERR2245277" "ERR2245279" "ERR2245288" "ERR2245293" ...
## - attr(*, "Upper")= logi FALSE
## - attr(*, "Diag")= logi FALSE
## - attr(*, "call")= language ape::dist.dna(x = ex.dna, model = "N", variance = FALSE, gamma = FALSE, pairwise.deletion = FALSE, base.freq| __truncated__
## - attr(*, "method")= chr "N"
dna.bin.dist # 是否为汉明距离 (序列之间核苷酸差异的数量)?
## ERR2245277 ERR2245279 ERR2245288 ERR2245293 ERR2245294 ERR2245295
## ERR2245279 15
## ERR2245288 80 65
## ERR2245293 90 75 42
## ERR2245294 38 23 66 76
## ERR2245295 16 1 66 76 24
## ERR2245344 67 52 77 87 53 53
## ERR2245378 27 12 71 81 29 13
## ERR2245380 66 51 62 72 52 52
## ERR2245386 41 26 69 79 27 27
## ERR2245388 15 0 65 75 23 1
## ERR2245393 82 67 26 44 68 68
## ERR2245400 28 13 72 82 30 14
## ERR2245401 58 43 54 64 44 44
## ERR2245402 47 32 75 85 33 33
## ERR2245406 59 44 55 65 45 45
## ERR2245411 63 48 73 83 49 49
## ERR2245418 62 47 72 82 48 48
## ERR2245420 58 43 54 64 44 44
## ERR2245425 62 47 72 82 48 48
## ERR2512377 48 33 94 104 56 34
## Reference 772 785 784 794 786 786
## ERR2245344 ERR2245378 ERR2245380 ERR2245386 ERR2245388 ERR2245393
## ERR2245279
## ERR2245288
## ERR2245293
## ERR2245294
## ERR2245295
## ERR2245344
## ERR2245378 58
## ERR2245380 63 57
## ERR2245386 56 32 55
## ERR2245388 52 12 51 26
## ERR2245393 79 73 64 71 67
## ERR2245400 59 3 58 33 13 74
## ERR2245401 55 49 14 47 43 56
## ERR2245402 62 38 61 36 32 77
## ERR2245406 56 50 15 48 44 57
## ERR2245411 22 54 59 52 48 75
## ERR2245418 21 53 58 51 47 74
## ERR2245420 55 49 14 47 43 56
## ERR2245425 21 53 58 51 47 74
## ERR2512377 85 45 82 59 33 96
## Reference 793 791 782 789 785 786
## ERR2245400 ERR2245401 ERR2245402 ERR2245406 ERR2245411 ERR2245418
## ERR2245279
## ERR2245288
## ERR2245293
## ERR2245294
## ERR2245295
## ERR2245344
## ERR2245378
## ERR2245380
## ERR2245386
## ERR2245388
## ERR2245393
## ERR2245400
## ERR2245401 50
## ERR2245402 39 53
## ERR2245406 51 1 54
## ERR2245411 55 51 58 52
## ERR2245418 54 50 57 51 1
## ERR2245420 50 0 53 1 51 50
## ERR2245425 54 50 57 51 1 0
## ERR2512377 46 74 65 75 81 80
## Reference 792 774 795 773 793 792
## ERR2245420 ERR2245425 ERR2512377
## ERR2245279
## ERR2245288
## ERR2245293
## ERR2245294
## ERR2245295
## ERR2245344
## ERR2245378
## ERR2245380
## ERR2245386
## ERR2245388
## ERR2245393
## ERR2245400
## ERR2245401
## ERR2245402
## ERR2245406
## ERR2245411
## ERR2245418
## ERR2245420
## ERR2245425 50
## ERR2512377 74 80
## Reference 774 792 752
# δ plot
delta.plot(dna.bin.dist, k = 20, plot = TRUE, which = 1:2)
# Holland, B. R., Huber, K. T., Dress, A. and Moulton, V. (2002) Delta plots: a tool for analyzing phylogenetic distance data. Molecular Biology and Evolution, 12, 2051–2059.
ex.dna=ex.dna.sub
# Tree Estimation Based on an Improved Version of the NJ Algorithm, Gascuel (1997)
# trw <- bionj(dist.dna(ex.dna.sub))
# trw <- bionj(dist.dna(ex.dna)) # 默认参数时 报错,可能是因为存在距离较远的Taxa或毒株
# trw <- bionj(dist.dna(ex.dna, model = "N")) # Error - at least one distance was greater than 100
trw <- bionj(dist.dna(ex.dna, model = "raw"))
plot(trw)
# f <- function(x) nj(dist.dna(x, model = "raw"))
# tr <- f(ex.dna.sub)
# ### Are bootstrap values stable?
# for (i in 1:5)
# print(boot.phylo(tr, ex.dna.sub, f, quiet = TRUE))
# ### How many partitions in 100 random trees of 10 labels?...
# TR <- rmtree(100, 10)
# pp10 <- prop.part(TR)
# length(pp10)
# ### ... and in 100 random trees of 20 labels?
# TR <- rmtree(100, 20)
# pp20 <- prop.part(TR)
# length(pp20)
# plot(pp10, pch = "x", col = 2)
# plot(pp20, pch = "x", col = 2)
fun <- function(x) as.phylo(hclust(dist.dna(x, model="raw"), "average")) # upgma() in phangorn
tree <- fun(ex.dna.sub)
## get 100 bootstrap trees:
bstrees <- boot.phylo(tree, ex.dna.sub, fun, trees = TRUE)$trees
##
Running bootstraps: 100 / 100
## Calculating bootstrap values... done.
## get proportions of each clade:
clad <- prop.clades(tree, bstrees, rooted = TRUE)
## get proportions of each bipartition:
boot <- prop.clades(tree, bstrees)
layout(1)
par(mar = rep(2, 4))
plot(tree, main = "Bipartition vs. Clade Support Values")
drawSupportOnEdges(boot)
nodelabels(clad)
legend("bottomleft", legend = c("Bipartitions", "Clades"), pch = 22,
pt.bg = c("green", "lightblue"), pt.cex = 2.5)
d <- dist.dna(ex.dna.sub, model="N")
delta.plot(d)
layout(1)
delta.plot(d, 40, which = 1)
# Tree Estimation Based on the Minimum Evolution Algorithm, Desper and Gascuel (2002).
trw <- fastme.bal(dist.dna(ex.dna, model="N"))
plot(trw)
rt <- dist.dna(ex.dna.sub, model="raw", variance = TRUE)
v <- attr(rt, "variance")
tr <- mvr(rt, v) # Phylogenetic tree construction based on the minimum variance reduction.
plot(tr, "u")
## Warning in plot.phylo(tr, "u"): 39 branch length(s) NA(s): branch lengths
## ignored in the plot
# Neighbor-Joining Tree Estimation, Saitou and Nei (1987).
trw <- nj(dist.dna(ex.dna, model="N"))
plot(trw)
## NJ树+多序列比对 - plot an alignment with a NJ tree:
trw <- nj(dist.dna(ex.dna.sub, model="raw"))
# 绘制带虚线、且右端对齐的树
plot(trw, x.lim = 0.1, align.tip = TRUE, font = 1)
# Tree Annotation
phydataplot(ex.dna[, 1:50], trw, "m", 0.05, border = NA)
## from ?boot.phylo:
f <- function(x) nj(dist.dna(x, model="raw"))
tw <- f(ex.dna) # NJ tree with K80 distance
set.seed(1)
## bootstrap with 100 replications:
(bp <- boot.phylo(tw, ex.dna, f, quiet = TRUE))
## [1] NA 23 44 88 90 100 89 23 100 100 100 100 95 26 100 100 29 100 100
## the first value relates to the root node and is always 100
## it is ignored below:
plot(tw, "u") # 辐射状的树
drawSupportOnEdges(bp)
## more readable but the tree is really unrooted:
plot(tw)
drawSupportOnEdges(bp)
op <- par(mfcol = 1:2)
par(op)
# tr <- rtree(10)
tr=tw
# tr$edge.length[c(1, 18)] <- 100
op <- par(mfcol = 1:2)
plot(tr); axisPhylo()
# 绘制长枝打断的树
plotBreakLongEdges(tr, 2); axisPhylo()
# tr <- triangMtd(dist.dna(ex.dna.sub, model="raw"))
# plot(tr)