来看看能被我称为"神器"的,到底是什么?

APE:在R语言中分析系统发育和进化 (e.g. 在获得全基因组Alignment之后)

Emmanuel Paradis, Julien Claude & Korbinian Strimmer

法国蒙彼利埃大学 古生物生物学和系统发育实验室;慕尼黑大学统计学系

R包神器 | 系统发育和进化分析 - ape (一)_sed

翻译APE发表的1篇原始文章,穿插一些代码、辅助信息,来了解APE的概况:

R包神器 | 系统发育和进化分析 - ape (一)_开发语言_02

DOI: 10.1093/bioinformatics/btg412

APE (Analysis of Phylogenetics and Evolution) 是1个用R语言编写的用于分子进化和系统发育分析的免费软件包,提供了读写数据和操作系统发育树的实用函数,以及几种用于系统发育和进化分析的高级方法 (e.g. 比较和群体遗传方法)。

APE利用了许多用于统计和图形的R函数,并为开发、实现进一步的进化过程分析的统计方法,提供了灵活的框架。可从官方的R包存档 (http://cran.r-project.org/src/contrib/PACKAGES.html#ape,现在可能为 https://cran.r-project.org/web/packages/ape/index.html)获得。

从广义上说,系统发育分析涵盖了非常广泛的方法,从计算进化距离、重建基因树、估计分叉(Divergence)日期,到比较数据分析、进化速率估计和多样性(Diversification)分析。所有这些不同的任务都有一个特殊的共同点:严重依赖计算统计 (Computational statistics)。R系统是一个免费的独立于平台的开源分析环境,最近(2004)已经形成统计计算和图形(Graphics)事实上的标准 (Ihaka & Gentleman, 1996)。

R的1个优点是,通过编写专门的包,可以很容易地针对特定应用领域进行定制(Tailored to a particular application area)。特别是,R在生物信息学中的有用性已经在基因表达数据(即转录组)的分析中得到了令人印象深刻的证明(http://www.bioconductor.org)。APE是首次在系统发育和进化数据的分析中发挥R的优势。APE专注于使用系统发育和系谱(Genealogical)树作为输入的统计分析。

在v1.1版中,APE提供了读取、写入、绘制、操作系统发育树,并在系统发育框架中分析比较数据(Comparative data)、多样化分析、计算等位基因和核苷酸数据的距离、读取核苷酸序列和其它几个工具的功能,如Mantel测试、最小生成树的计算、群体遗传学参数的估计。

表1概述了APE中目前可用的函数。注意,有些方法 (如比较法、天际线图/Skyline plot等) 以前只能在专门的软件中使用。外部的树重建程序(如PHYLIP)可以通过标准的Shell命令从R调用。

R包神器 | 系统发育和进化分析 - ape (一)_sed_03

表 1. APE v1.1版中可用的特定函数

mst 最小生成树 - Minimum Spanning Tree
phylo 进化树在ape包中的数据对象的类(Class)名称 - A single phylogenetic tree may have several representations in the Newick format and in the "phylo" class of objects used in ‘ape’.skyline estimate of effective population size from an estimated phylogeny. ltt.plot Lineages Through Time Plot. coalescent 溯祖. hivtree 193个HIV-1序列的系统发育树数据. opsin 视紫蛋白数据.

R语言的另1个优点是可以直接获得具有发表质量的图形输出,特别是使用PostScript设备时。例如,APE中的系统发育绘图功能处理颜色、线的粗细、字体、标签的间距,这些可为每个分枝(Branch)单独定义,因此可在单个系统发育图上表示3个不同的变量。

APE还产生复杂的群体遗传学图,如广义的天际线图(Generalized skyline plot, Strimmer和Pybus, 2001),只需1个命令。与任何R包一样,APE是命令行驱动的 (重现、维护、共享非常方便)。函数由用户调用,可能带有参数和选项。在R中使用APE的任何会话(Session)都从此命令开始(当然,需要提前安装):

library(ape)

这使得APE的函数(执行特定的统计、绘图等功能)在R环境中可用。下面的命令可显示这些函数的列表 (名称和简述):

library(help = ape)

APE中所有可用的R函数(表1)都以R超文本格式记录,有关它们使用的信息可以通过应用help命令检索,例如:

help(read.tree)

以标准Newick插句格式保存在磁盘上的文本文件(e.g. tree1.txt)中的进化树可被读取:

tree1 <- read.tree('tree1.txt')

这会将系统进化树存储在名为"tree1"的类(Class)'phylo'对象中。该对象中存储的信息(e.g. 分枝长度)可通过输入`tree1`(命令)来检查,并且可通过执行以下(命令),来获得进化分枝图(Cladogram)形式的图形输出:

plot(tree1)

这实际上调用了APE的plot.phylo函数,绘制了系统发育树tree1。由于R的面向对象的性质,命令plot(x) (泛型函数?)可能会给出1个完全不同的结果,取决于对象所属的类 (class of x)。默认情况下,树被绘制在图形窗口上(Graphical window),但可以根据操作系统以各种文件格式导出。

除了这些简单的示例之外,在面向对象的结构中表示系统发育树可以直接操作进化分析中使用的各种计算(方法)的系统发育数据。目前在APE中实施的一些方法:系统发育独立对比(Felsenstein, 1985; Harvey and Pagel, 1991),拟合生卒模型 (Fitting birth–death models. Nee et al., 1994; Pybus and Harvey, 2000),群体遗传分析(Nee et al., 1995;Strimmer and Pybus, 2001),进化速率的非参数平滑 (Sanderson, 1997),利用Klastorin的方法在系统发育树中估计基因分组组 (Misawa and Tajima, 2000)。

此外,在R函数hclust中实现的基于距离的聚类方法(Distance based clustering)可被APE使用,进而通过函数转换并生成类'phylo'类对象和'hclust'类对象

APE中的类和方法 (如'phylo') 可以很容易地进一步扩展,来包括其它功能,例如:注释系统发育树。因此,APE不仅是一个数据分析包,也是一个开发和实现新方法的(开发)环境。此外,用C、C++或Fortran77编写的程序可从R中链接和调用,对于计算机密集型计算很有用。

(附) APE绘制天际线图 (示例,未美化)

# mcmc.popsize函数 - 可逆跳跃MCMC (马尔可夫链蒙特卡罗) 推断群体数量史 (Demographic History)
# get tree
data("hivtree.newick") # example tree in NH format
tree.hiv <- read.tree(text = hivtree.newick) # load tree
# run mcmc chain
mcmc.out <- mcmc.popsize(tree.hiv, nstep=100, thinning=1, burn.in=0,progress.bar=FALSE) # toy run
#mcmc.out <- mcmc.popsize(tree.hiv, nstep=10000, thinning=5, burn.in=500) # remove comments!!
# make list of population size versus time
popsize <- extract.popsize(mcmc.out)
# plot and compare with skyline plot
sk <- skyline(tree.hiv)
plot(sk, lwd=1, lty=3, show.years=TRUE, subst.rate=0.0023, present.year = 1997)
lines(popsize, show.years=TRUE, subst.rate=0.0023, present.year = 1997)

R包神器 | 系统发育和进化分析 - ape (一)_sed_04

mcmc.popsize函数 - 可逆跳跃MCMC (马尔可夫链蒙特卡罗) 推断群体数量史 (Demographic History)

# skyline函数 - 使用天际线图估计有效种群大小(Effective Population Size)
# get tree
data("hivtree.newick") # example tree in NH format
tree.hiv <- read.tree(text = hivtree.newick) # load tree
# corresponding coalescent intervals
ci <- coalescent.intervals(tree.hiv) # from tree
# collapsed intervals
cl1 <- collapsed.intervals(ci,0)
cl2 <- collapsed.intervals(ci,0.0119)
#### classic skyline plot ####
sk1 <- skyline(cl1) # from collapsed intervals
sk1 <- skyline(ci) # from coalescent intervals
sk1 <- skyline(tree.hiv) # from tree
sk1
plot(skyline(tree.hiv))
skylineplot(tree.hiv) # shortcut
plot(sk1, show.years=TRUE, subst.rate=0.0023, present.year = 1997)
#### generalized skyline plot ####
sk2 <- skyline(cl2) # from collapsed intervals
sk2 <- skyline(ci, 0.0119) # from coalescent intervals
sk2 <- skyline(tree.hiv, 0.0119) # from tree
sk2
plot(sk2)
# classic and generalized skyline plot together in one plot
plot(sk1, show.years=TRUE, subst.rate=0.0023, present.year = 1997, col=c(grey(.8),1))
lines(sk2, show.years=TRUE, subst.rate=0.0023, present.year = 1997)
legend(.15,500, c("classic", "generalized"), col=c(grey(.8),1),lty=1)
# find optimal epsilon parameter using AICc criterion
find.skyline.epsilon(ci)
sk3 <- skyline(ci, -1) # negative epsilon also triggers estimation of epsilon
sk3$epsilon

R包神器 | 系统发育和进化分析 - ape (一)_开发语言_05

skyline函数 - 使用天际线图估计有效种群大小(Effective Population Size)

# skylineplot函数 - 绘制天际线图
# get tree
data("hivtree.newick") # example tree in NH format
tree.hiv <- read.tree(text = hivtree.newick) # load tree
#### classic skyline plot
skylineplot(tree.hiv) # shortcut
#### plot classic and generalized skyline plots and estimate epsilon
sk.opt <- skylineplot.deluxe(tree.hiv)
sk.opt$epsilon
#### classic and generalized skyline plot ####
sk1 <- skyline(tree.hiv)
sk2 <- skyline(tree.hiv, 0.0119)
# use years rather than substitutions as unit for the time axis
plot(sk1, show.years=TRUE, subst.rate=0.0023, present.year = 1997, col=c(grey(.8),1))
lines(sk2, show.years=TRUE, subst.rate=0.0023, present.year = 1997)
legend(.15,500, c("classic", "generalized"), col=c(grey(.8),1),lty=1)
#### various skyline plots for different epsilons
layout(mat= matrix(1:6,2,3,byrow=TRUE))
ci <- coalescent.intervals(tree.hiv)
plot(skyline(ci, 0.0));title(main="0.0")
plot(skyline(ci, 0.007));title(main="0.007")
plot(skyline(ci, 0.0119),col=4);title(main="0.0119")
plot(skyline(ci, 0.02));title(main="0.02")
plot(skyline(ci, 0.05));title(main="0.05")
plot(skyline(ci, 0.1));title(main="0.1")
layout(mat= matrix(1:1,1,1,byrow=TRUE))

R包神器 | 系统发育和进化分析 - ape (一)_开发语言_06

skylineplot函数 - 绘制天际线图

(附) 关于程辑包‘ape v5.7-1’的信息

library(help = ape)

Version: 5.7-1

Date: 2023-03-13

Built: R 4.3.0; x86_64-w64-mingw32; windows

Description:        Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel's test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clock-like trees using mean path lengths and penalized/罚分 likelihood, dating trees with non-contemporaneous/非同期 sequences, translating DNA into AA sequences, and assessing sequence alignments.

Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ*, BIONJ*, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, T-Coffee, Muscle) whose results are returned into R.

全部索引

AAbin == Amino Acid Sequences

CADM.global == Congruence/一致 among distance matrices

DNAbin == Manipulate DNA Sequences in Bit-Level Format

DNAbin2indel == Recode Blocks of Indels

FastME == Tree Estimation Based on the Minimum Evolution Algorithm

Initialize.corPhyl == Initialize a 'corPhyl' Structure Object

LTT == Theoretical Lineage-Through Time Plots

MPR == Most Parsimonious Reconstruction

Moran.I == Moran's I Autocorrelation Index

SDM == Construction of Consensus Distance Matrix With SDM

ace == Ancestral Character Estimation

add.scale.bar == Add a Scale Bar to a Phylogeny Plot

additive == Incomplete Distance Matrix Filling

alex == Alignment Explorer With Multiple Devices

all.equal.DNAbin == Compare DNA Sets

all.equal.phylo == Global Comparison of two Phylogenies

alview == Print DNA or AA Sequence Alignement

ape-package == Analyses of Phylogenetics and Evolution

apetools == Tools to Explore Files

as.alignment == Conversion Among DNA Sequence Internal Formats

as.bitsplits == Split Frequencies and Conversion Among Split Classes

as.matching == Conversion Between Phylo and Matching Objects

as.phylo == Conversion Among Tree and Network Objects

as.phylo.formula == Conversion from Taxonomy Variables to Phylogenetic Trees

axisPhylo == Axis on Side of Phylogeny

balance == Balance of a Dichotomous/二歧分枝 Phylogenetic Tree

base.freq == Base frequencies from DNA Sequences

bd.ext == Extended Version of the Birth-Death Models to Estimate Speciation and Extinction Rates (估算物种形成和灭绝速率)

bd.time == Time-Dependent Birth-Death Models

binaryPGLMM == Phylogenetic Generalized Linear Mixed Model for Binary Data

bind.tree == Binds Trees

bionj == Tree Estimation Based on an Improved Version of the NJ Algorithm

bird.families == Phylogeny of the Families of Birds From Sibley and Ahlquist

bird.orders == Phylogeny of the Orders of Birds From Sibley and Ahlquist

birthdeath == Estimation of Speciation and Extinction Rates With Birth-Death Models

boot.phylo == Tree Bipartition and Bootstrapping Phylogenies

branching.times == Branching Times of a Phylogenetic Tree

c.phylo == Building Lists of Trees

carnivora == Carnivora/食肉类 body sizes and life history traits

checkAlignment == Check DNA Alignments

checkLabel == Checking Labels

checkValidPhylo == Check the Structure of a "phylo" Object

cherry == Number of Cherries and Null Models of Trees

chiroptera == Bat Phylogeny

chronoMPL == Molecular Dating With Mean Path Lengths

chronopl == Molecular Dating With Penalized Likelihood

chronos == Molecular Dating by Penalised Likelihood and Maximum Likelihood

clustal == Multiple Sequence Alignment with External Applications

coalescent.intervals == Coalescent/溯祖 Intervals

collapse.singles == Collapse Single Nodes

collapsed.intervals == Collapsed Coalescent Intervals

compar.cheverud == Cheverud's Comparative Method

compar.gee == Comparative Analysis with GEEs

compar.lynch == Lynch's Comparative Method

compar.ou == Ornstein-Uhlenbeck Model for Continuous Characters

comparePhylo == Compare Two "phylo" Objects

compute.brlen == Branch Lengths Computation

compute.brtime == Compute and Set Branching Times

consensus == Concensus Trees

cophenetic.phylo == Pairwise Distances from a Phylogenetic Tree

cophyloplot == Plots two phylogenetic trees face to face with links between the tips.

R包神器 | 系统发育和进化分析 - ape (一)_r语言_07

corBlomberg == Blomberg et al.'s Correlation Structure

corBrownian == Brownian Correlation Structure

corClasses == Phylogenetic Correlation Structures

corGrafen == Grafen's (1989) Correlation Structure

corMartins == Martins's (1997) Correlation Structure

corPagel == Pagel's "lambda" Correlation Structure

corphylo == Correlations among Multiple Traits with Phylogenetic Signal

correlogram.formula == Phylogenetic Correlogram/相关图

data.nex == NEXUS Data Example

dbd == Probability Density Under Birth-Death Models

def == Definition of Vectors for Plotting or Annotating

degree == Vertex Degrees in Trees and Networks

del.gaps == Delete Alignment Gaps in DNA Sequences

delta.plot == Delta Plots

dist.dna == Pairwise Distances from DNA Sequences

dist.gene == Pairwise Distances from Genetic Data

dist.topo == Topological Distances Between Two Trees

diversi.gof == Tests of Constant Diversification Rates (恒定多样化比率)

diversi.time == Analysis of Diversification with Survival Models

diversity.contrast.test == Diversity Contrast Test

dnds == dN/dS Ratio

drop.tip == Remove Tips in a Phylogenetic Tree

edges == Draw Additional Edges on a Plotted Tree

evonet == Evolutionary Networks

ewLasso == Incomplete distances and edge weights of unrooted topology

gammaStat == Gamma-Statistic of Pybus and Harvey

getAnnotationsGenBank == Read Annotations from GenBank

hivtree == Phylogenetic Tree of 193 HIV-1 Sequences

howmanytrees == Calculate Numbers of Phylogenetic Trees

identify.phylo == Graphical Identification of Nodes and Tips

image.DNAbin == Plot of DNA Sequence Alignement

is.binary == Test for Binary Tree

is.compatible == Check Compatibility of Splits

is.monophyletic == Is Group Monophyletic(单源)

is.ultrametric == Test if a Tree is Ultrametric/超度量/超矩阵

kronoviz == Plot Multiple Chronograms(编年/计时图) on the Same Scale

label2table == Label Management

ladderize == Ladderize(阶梯化) a Tree

latag2n == Leading and Trailing Alignment Gaps to N

lmorigin == Multiple regression through the origin

ltt.plot == Lineages Through Time Plot

makeLabel == Label Management

makeNodeLabel == Makes Node Labels

mantel.test == Mantel Test for Similarity of Two Matrices

mat3 == Three Matrices

mat5M3ID == Five Trees

mat5Mrand == Five Independent Trees

matexpo == Matrix Exponential/指数

mcconwaysims.test == McConway-Sims Test of Homogeneous Diversification

mcmc.popsize == Reversible Jump MCMC to Infer Demographic/群体数量 History

mixedFontLabel == Mixed Font Labels for Plotting

mrca == Find Most Recent Common Ancestors Between Pairs

mst == Minimum Spanning Tree

multi2di == Collapse and Resolve Multichotomies/多歧分枝

multiphylo == Manipulating Lists of Trees

mvr == Minimum Variance Reduction

nj == Neighbor-Joining Tree Estimation

njs == Tree Reconstruction from Incomplete Distances With NJ* or bio-NJ*

node.dating == node.dating

node.depth == Depth and Heights of Nodes and Tips

nodelabels == Labelling the Nodes, Tips, and Edges of a Tree

nodepath == Find Paths of Nodes

parafit == Test of host-parasite coevolution (宿主-寄主共进化)

pcoa == Principal Coordinate Analysis

phydataplot == Tree Annotation

phymltest == Fits a Bunch of Models with PhyML

pic == Phylogenetically Independent Contrasts

pic.ortho == Phylogenetically Independent Orthonormal/标准正交 Contrasts

plot.correlogram == Plot a Correlogram/相关图

plot.phylo == Plot Phylogenies

plot.phylo.extra == Extra Fuctions to Plot and Annotate Phylogenies

plot.varcomp == Plot Variance Components

plotTreeTime == Plot Tree With Time Axis

print.phylo == Compact/密集 Display of a Phylogeny

rDNAbin == Random DNA Sequences

rTraitCont == Continuous Character Simulation

rTraitDisc == Discrete Character Simulation

rTraitMult == Multivariate/多变 Character Simulation

read.GenBank == Read DNA Sequences from GenBank via Internet

read.caic == Read Tree File in CAIC Format

read.dna == Read DNA Sequences in a File

read.gff == Read GFF Files

read.nexus == Read Tree File in Nexus Format

read.nexus.data == Read Character Data In NEXUS Format

read.tree == Read Tree File in Parenthetic Format

reconstruct == Continuous Ancestral Character Estimation (连续祖先特征估计)

reorder.phylo == Internal Reordering (内部重排) of Trees

richness.yule.test == Test of Diversification-Shift With the Yule Process

rlineage == Tree Simulation Under the Time-Dependent Birth-Death Models

root == Roots Phylogenetic Trees

rotate == Swapping/交换 Sister Clades

rtree == Generate Random Trees

rtt == Root a Tree by Root-to-Tip Regression

seg.sites == Find Segregating/分隔 Sites in DNA Sequences

skyline == Skyline Plot Estimate of Effective Population Size

skylineplot == Drawing Skyline Plot Graphs

slowinskiguyer.test == Slowinski-Guyer Test of Homogeneous Diversification

solveAmbiguousBases == Solve Ambiguous Bases in DNA Sequences

speciesTree == Species Tree Estimation

stree == Generates Systematic Regular Trees (系统规则树)

subtreeplot == Zoom on a Portion of a Phylogeny by Successive/连续 Clicks

subtrees == All subtrees of a Phylogenetic Tree

summary.phylo == Print Summary of a Phylogeny

trans == Translation from DNA to Amino Acid Sequences

treePop == Tree Popping/爆开

trex == Tree Explorer With Multiple Devices

triangMtd == Tree Reconstruction Based on the Triangles/三角 Method

unique.multiPhylo == Revomes Duplicate Trees

updateLabel == Update Labels

varCompPhylip == Variance Components with Orthonormal Contrasts

varcomp == Compute Variance Component Estimates

vcv == Phylogenetic Variance-covariance or Correlation Matrix

vcv2phylo == Variance-Covariance Matrix to Tree

weight.taxo == Define Similarity Matrix

where == Find Patterns in DNA Sequences

which.edge == Identifies Edges of a Tree

woodmouse == Cytochrome b Gene Sequences of Woodmice

write.dna == Write DNA Sequences in a File

write.nexus == Write Tree File in Nexus Format

write.nexus.data == Write Character Data in NEXUS Format

write.tree == Write Tree File in Parenthetic Format

yule == Fits the Yule Model to a Phylogenetic Tree

yule.cov == Fits the Yule Model With Covariates

yule.time == Fits the Time-Dependent Yule Model

zoom == Zoom on a Portion of a Phylogeny

(附) 实战及示例数据简要测试

R包ape

ape包用于系统发育和进化分析,提供了以下功能:

1.读、写、操作、分析、模拟系统发育树及DNA序列;2.计算DNA距离;3.翻译为AA(氨基酸?)序列;4.基于距离法估计(进化)树;5.比较及多样性分析;6.自编程新的系统发育方法。

文档:用library(help = ape)显示完整函数列表;案例代码 x:/Program Files/R/R-x.x.x/library/ape/doc;更多信息 https://cran.r-project.org/web/packages/ape/ape.pdf DOI: 10.1093/bioinformatics/btg412 doi: 10.1093/bioinformatics/bty633 https://hal.ird.fr/ird-01887318 http://ape-package.ird.fr/, https://github.com/emmanuelparadis/ape http://ape-package.ird.fr/ 等

library(ape)
# library(help = ape)

ape包的read.dna函数:

从文件中读取DNA序列,返回1个矩阵或DNA序列列表,分别将读取的分类群(Taxa)的名称作为rownamenames

序列以二进制格式返回(默认),或小写字母(选项as.character = TRUE)

# 读取序列
ex.dna <- ape::read.dna("./result/pairsnp/core.Temporal.aln.fasta", format = "fasta", as.character=F)
ex.dna.sub <- ape::read.dna("./result/pairsnp/core.Temporal.aln.noREF.fasta", format = "fasta", as.character=F)
# 选项format:"fasta", "sequential", "clustal", "interleaved"(序列名含空格时), 或任何明确的缩写

# 查看
str(ex.dna)
##  'DNAbin' raw [1:22, 1:984] g g g g ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:22] "ERR2245277" "ERR2245279" "ERR2245288" "ERR2245293" ...
##   ..$ : NULL
ex.dna[1:5,1:5,drop=F]
## 5 DNA sequences in binary format stored in a matrix.
## 
## All sequences of same length: 5 
## 
## Labels:
## ERR2245277
## ERR2245279
## ERR2245288
## ERR2245293
## ERR2245294
## 
## Base composition:
##   a   c   g   t 
## 0.0 0.6 0.4 0.0 
## (Total: 25 bases)
alview(ex.dna, file = "", uppercase = TRUE, showpos = TRUE)
##            000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111122222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222223333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444455555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555556666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777788888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888889999999999999999999999999999999999999999999999999999999999999999999999999999999999999
##            000000000111111111122222222223333333333444444444455555555556666666666777777777788888888889999999999000000000011111111112222222222333333333344444444445555555555666666666677777777778888888888999999999900000000001111111111222222222233333333334444444444555555555566666666667777777777888888888899999999990000000000111111111122222222223333333333444444444455555555556666666666777777777788888888889999999999000000000011111111112222222222333333333344444444445555555555666666666677777777778888888888999999999900000000001111111111222222222233333333334444444444555555555566666666667777777777888888888899999999990000000000111111111122222222223333333333444444444455555555556666666666777777777788888888889999999999000000000011111111112222222222333333333344444444445555555555666666666677777777778888888888999999999900000000001111111111222222222233333333334444444444555555555566666666667777777777888888888899999999990000000000111111111122222222223333333333444444444455555555556666666666777777777788888
##            123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234
## ERR2245277 GCCCGTCGCTGGTGGGTAATAATGAAGAGCGTGGTGCAGTCTAAGGGCGTATGGACCCTTGCGCCCACTCTACCGTGCGGACGTCTCACTCTATATGCTCGACACCGTTCGTTCGACTCGCCCCCGTAGGCTCGGCCACAAGTGCCGTCGCAGGATTCAGCTCGAGCGCCCGAGCAGATCCAACCCCTTCTTCCTTGGGGTTCCTCGGTTGCTCGAATTCGCTACGCCCCCAGGGCAGATCCAGCGCTCGCAGGGTAACGGTCGTACCAGGGGTAAGTGGCACCAAGCAGCAGGGTCCGGGACCTGCACGCCGGACCGGCGCTACCTCGCTTCTGCGTTGGTCGTCTCATTGATCGCCGTTCCACGTAACCCAGCTATGCTAGCAGACTCACAAACTGGGCGACATCGATACCCCCAGACTTTTCACTGGCTCAGATGTCAAGCACGGACTGGTGACTCAGACTACCATTTTTTTGGCTCCTTGGTGTCCTAACCGCGCACGAAGCCGGGCTTACTGACACATGGTGCGCAGCTACGACGCAAGAAACATCGGAACCGCGCGCCTACGCGTAAAGAAGGTGGGGAAGGCCGTTCAGGCGTGGCTGTGACGCCATAAGAGGTCCAGGCGCAGCATGGGCCCTGCGTACAGAGGACCCCAAAAAAGAAAGGTGTAAGCCCCTCGTGATTGGCGTGTTCACAGCAACCGGGAGGGCGTACCGTCGGGACTCCAAACATCCCTGGATGTTCGTGTGCGCCCGGGAGTGCCCCTAGCGGCGCATAGTCCTCGCGGCTCAGTCAATTTCACTTCCTTGCATCTATGGGGGTCCCGCATTTTTGGTTAGCCGCGCGAGGGTCACCCAACGTTTGATATTGCCGTAGCTAGTGAGTGATGTGCCTTCCCCTGTCAACGATGCGAATCGTACGTAGGGGCGGGGTGCTGCTAGTAGGGAACATTGAAAAAAAATGCAATGTACTGGCCAAAGC
## ERR2245279 ...........................................................C.....................................................................................................................G.............................................G..............................................................................................................................................................C.....................................................................................C...................................................G........................................................................................................A......................................G........G...............................A............A..................................................G...........................................................G...............A..............................................C...........................................................................................
## ERR2245288 ........A.....................AC......A...G..........T.....C...G.......................G.............G...............C...............A...G...................T...................G.......G.G......................C............G...T..............G.......................T................................AC...............T...............T.......................A.A...A.......T...........C.......................G......A.....A................................A.C.............C..............................G....................G...........................C................G...............A.............A.............A..............CA..G.............................G.....GG.....C.G...TT.....G.............G......A...A........A......T...........................................G..T.T.......................G...........C....T..C..........G...............A.........G..A.......A..G..C...................C.......G..C.T..................................................G..........C................
## ERR2245293 ........A......................C...........................C...GT...........T..........G.............G....A..........C...............A...G..................G....................G.........G......................C............GT............A....G.......................T..A.............................AC...............................T.......A...............A.A..TA.......T...........C.......................G............A..................................C.............C......A.......................G....................G..A........................C................G...............................TA..........A...............AA.G.............................G.....GG.....C.G...TT.....G.............G....T.A...A........A..............G................................T..G..T.T..G..............T.....G...........C....T..C......T...G.......A...A...AA........G.............G..C.....A.....T.......C.......G..C.T...........................A......................G..........C................
## ERR2245294 ...............................C...........................C...G.....................................G...........T...............................................................G.....................T.......................G.A..............T..............................................................................................T......................A.......................C.....................C................................................C..............C...........T.......................................G...........................................C............................................................A......................................G......C.G..........G............TG......A............A...............C..................................G................................................C..........G...............A..........................C...................C.......G.............................................A....................C................
## ERR2245295 ...........................................................C.....................................................................................................................G.............................................G................................................................................................................................................C.............C.....................................................................................C...................................................G........................................................................................................A......................................G........G...............................A............A..................................................G...........................................................G...............A..............................................C...........................................................................................
## ERR2245344 ..........................AC.T.C.........................T.C...G.....................................G.......T.......C........................................G...........T......G..G......G.T.................................G............................T.............T..................G..........................TT.......T.......T...................C....A...A.......................C..............A....G.................................C.................C.............C..............................G....................G...............A..........T.......................................T.....................................A....T..................G..............G......C.G....T.....G.............G......A.......A....A..................................................G.G.................A............................C..........G...............A......................CG..C......T............C.......G..C...................................A.A..............G..........C................
## ERR2245378 ...............................C...........................C.....................................................................................................................G.............................................G......................................................................................................................................A.....G.................C.............................T.......................................................C...................................................G......................A.......T.......A.................................................................A......................................G........G...............................A......G.....A..................................................G....................T................T.....................G...............A..........................C...................C.........................................................................................C.
## ERR2245380 ........................G......C.C.........................C...G..................................G..G...............C.........................A...........................A.....G.........G...................................G.................AG.......T...............T......................A...................................T................................A...A..........A........C.......G.........G..................A....T.........T...................C.............C.................A..........A.G....................G...................T...............................................A....................................A..G...................................GG.....C.G....T.....G......G......G......A............A..................................................G..T.....................A.A.............C.......C..........G...............A...................G...G..C...................C.......G..C........A...........................................G...C......C................
## ERR2245386 ...............................C...........................CA..G.................T...................G...........................................................................G.............................A...............G..............................................A............G..........................................................................A.......................C..........................G................T.........................................C...................................................G..............................................A.....................................................T...A......................................G......C.G..........G...A.........G......A.A..........A..................................................G................................................C....A.....G...............A...............G..........C...................C.......G............................A..A..................................C................
## ERR2245388 ...........................................................C.....................................................................................................................G.............................................G..............................................................................................................................................................C.....................................................................................C...................................................G........................................................................................................A......................................G........G...............................A............A..................................................G...........................................................G...............A..............................................C...........................................................................................
## ERR2245393 T.......A......................C...........................C...G....................T..G.............G...............C...............A...G...................T...................G.......G.G......................C.......G....G..................G.................A.....T................................AC...............................T.......................A.A...A.......T...........C.......................G......A.....A.................A..............A.C.............C..............................G....................G...........................C................G...T..........T............................A.........A.....A..G..........................A..G.....GG.....C.G...TT.....G.............G......A...A........A........T..............G..........................G..T.T.......................G...........C....T..C..........G..C............A.........G.............G..C..................AC.......G..C.T..................................................G..........C........A.......
## ERR2245400 ...............................C...........................C.....................................................................................................................G.............................................G......................................................................................................................................A.....G.................C.............................T......................................A................C.................................................T.G..............................T.......A.................................................................A......................................G........G...............................A......G.....A..................................................G....................T................T.....................G...............A..........................C...................C.........................................................................................C.
## ERR2245401 ........................G......C.C.........................C...G..................................G..G...............C.........................A...........................A.....G.........G...................................G.................AG.......................T......................A....................................................................A...A...................C.......G.........G..................A..............T...................C..........T..C............................A.G....................G..............................................................T.........................................A..G......................T............GG.....C.G....T.....G......G......G......A............A..................................................G..T.....................................C.......C..........G...............A.......................G..C...................C.......G..C........A...........................................G..........C................
## ERR2245402 .....C.........................CA..........................C...G...............A.....................G...............................................................A...........G.............................................G.........................................................................A.........................................C...............A..A.......................CA......................................................G.............................C..........T........A...............................G.....................................A..................................................................A.....T................................G...T..C.G..........G.............G......A............A...........................T....C...........A.....G......G.....A...................................C...A.C....G...............A..........................C...................C.......G..................................................................C................
## ERR2245406 ........................G......C.C.........................C...G..................................G..G...............C.........................A...........................A.....G.........G...................................G.................AG.......................T......................A....................................................................A...A...................C.......G.........G..................A..............T...................C..........T..C............................A.G....................G..............................................................T..........................T..............A..G......................T............GG.....C.G....T.....G......G......G......A............A..................................................G..T.....................................C.......C..........G...............A.......................G..C...................C.......G..C........A...........................................G..........C................
## ERR2245411 .............................T.C.........................T.C...G.........T...............C...........G.......T.......C...........................................................G..G......G.T.................................G............................T.............T..................G...................T.......T.......T.....T.....................C....A...A.......................C...........G.......G.................................C.................C.............C..............................G....................G..........................T.............................................................................A....T.................................G......C.G....T.....G.............G.....TA............A........................A........G................G.G.................A.............G..............C..........G...............A......................CG..C...................C.......G..C...................................A................G..........C................
## ERR2245418 .............................T.C.........................T.C...G.........T...............C...........G.......T.......C...........................................................G..G......G.T.................................G............................T.............T..................G...................T.......T.......T...........................C....A...A.......................C...........G.......G.................................C.................C.............C..............................G....................G..........................T.............................................................................A....T.................................G......C.G....T.....G.............G.....TA............A........................A........G................G.G.................A.............G..............C..........G...............A......................CG..C...................C.......G..C...................................A................G..........C................
## ERR2245420 ........................G......C.C.........................C...G..................................G..G...............C.........................A...........................A.....G.........G...................................G.................AG.......................T......................A....................................................................A...A...................C.......G.........G..................A..............T...................C..........T..C............................A.G....................G..............................................................T.........................................A..G......................T............GG.....C.G....T.....G......G......G......A............A..................................................G..T.....................................C.......C..........G...............A.......................G..C...................C.......G..C........A...........................................G..........C................
## ERR2245425 .............................T.C.........................T.C...G.........T...............C...........G.......T.......C...........................................................G..G......G.T.................................G............................T.............T..................G...................T.......T.......T...........................C....A...A.......................C...........G.......G.................................C.................C.............C..............................G....................G..........................T.............................................................................A....T.................................G......C.G....T.....G.............G.....TA............A........................A........G................G.G.................A.............G..............C..........G...............A......................CG..C...................C.......G..C...................................A................G..........C................
## ERR2512377 ....................C......................................C...........C..C.............................T..............................................G.........................G.............................................G..................G.T.............................G..CA..............................A................................T.......C........C....................C.C.....C................C............G.........................G.....................T.C...............................................T...G...A.............G.........C................................................................C...........A................A...........T.........G........G...............................A............A..................................................G.............T.............................................G...............A..............................................C................A..........C............................G.........G....................C...
## Reference  .TGGA.TC.GCCCCAAGGGCCGGA.C..T..C..CATG.CTC.GCCATACGGT.GTT.C..TCG.TGTAACCG.CC.TC.C.AG.CTGG.GCCCGGCA.TAGTGTT.CG.ACC.TGACGCGAAGTTGGTCACG.CGT.TCCCG.TAACTCTGTTGC..GTTCTAG.TCTT..GATCA.CT.CGTG.TGC.CCGTCCCAA.GGTTGTC.CC.TGTCCGG.TATC...G.GGAGCCATG.GC..GCTATCTA.C.ATCTGTT.CTAGCTGC..TACGTACAATCT.T.AAG.ACAAACT.A.CGGTC.TGGATA..GG.AATT.CCT.C.T.CC.CC.ACC..CTCCACAG.CACG...TACC.AC.TCG.T.AG.TCGCCGCC..CAGTCA.TGG.TCAAA.A.T.C.AG.GG..TTGCGAGCAC.C.CCCTCGC.G...GGGATGGATGGC...CTTCAGTGTCG.TG.GCACCC.AAC..CCCTC.C.AGCGTAAA.GGAACCTGTACTACCCTCTG.G.GC.AGATAGGCGCGG.CGC.GG.GGC.CGG.CGGAGC..GATG..T.T.CGGCAGCAAC..A.CCA..TACCGG.T..CACTCTCCGA.TTGCCCAGA.C.TC...AGG..CCAAATGGCAAACGTGC.T.TTT.GGGTGGGT.GCC.GACG.ATA.TCTACAGCCA.TC.ACAGG.GCGGG...A.C.TA..GTGA.ATAAAC.C.AGGCA..AGGCCAG..CC.CCAC..TGTGCACGAC.GG.AC.A.TC.A..GGAC.TCTAAT..G.G.C.GGCGCG.GGC.GCCTTGGTCGCAAT....GAAGCG.ACCA.CCG.TGA..TAGCTACG.TT.GG.ACC..G.GCCCGTACG..CGTC.GAGAGG..CTTCGTTGTCCG.GGGACC.TAGGGGCCGTAACCAA.GA..CCATC.T.GACG.CAGGTGCCAGCGG.GGGGCACGGCCCTGC.TGGCC.G
image(ex.dna[, 1:100], cex.lab = 0.5, cex.axis = 0.7)

R包神器 | 系统发育和进化分析 - ape (一)_开发语言_08

checkAlignment(ex.dna, check.gaps = TRUE, plot = TRUE, what = 1:4)
## 
## Number of sequences: 22 
## Number of sites: 984 
## 
## No gap in alignment.
## 
## Number of segregating sites (including gaps): 984
## Number of sites with at least one substitution: 984
## Number of sites with 1, 2, 3 or 4 observed bases:
##   1   2   3   4 
##   0 984   0   0

R包神器 | 系统发育和进化分析 - ape (一)_kylin_09

由fasta到DNAbin的另1个可选方法

# adegenet包的fasta2DNAbin函数:读取fasta格式的(序列)对齐,扩展名 .fasta .fa .fas
# 输出DNAbin对象 (来自ape包的高效DNA表示),含完整的序列对齐(Alignment) 或 SNP(参数`snpOnly`)
# 此函数可提高内存效率,并且可以读取比Ape包的read.dna函数更大的数据集

ex.dna.f2d <- adegenet::fasta2DNAbin("./result/pairsnp/core.Temporal.aln.fasta", quiet=FALSE, chunkSize=10, snpOnly=FALSE)
## 
##  Converting FASTA alignment into a DNAbin object... 
## 
## 
##  Finding the size of a single genome... 
## 
## 
##  genome size is: 984 nucleotides 
## 
## ( 2  lines per genome )
## 
##  Importing sequences... 
## ............................................................
##  Forming final object... 
## 
## ...done.
str(ex.dna.f2d) # DNA sequences in binary format stored in a matrix
##  'DNAbin' raw [1:22, 1:984] g g g g ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:22] "ERR2245277" "ERR2245279" "ERR2245288" "ERR2245293" ...
##   ..$ : NULL
identical(ex.dna.f2d, ex.dna)
## [1] TRUE
all.equal.DNAbin(ex.dna.f2d, ex.dna, plot=T)
## [1] TRUE
identical(ex.dna.sub, ex.dna)
## [1] FALSE
all.equal.DNAbin(ex.dna.sub, ex.dna, plot=T)
## [1] "Number of sequences different:"                         
## [2] "21 sequences in 1st object; 22 sequences in 2nd object."
## [3] "Subset your data for further comparison."

ape的更多函数:

可在文档(ape.pdf)中搜索更多的应用,e.g. 搜索”DNAbin” (查看能分析DNAbin的所有函数) https://cran.r-project.org/web/packages/ape/ape.pdf

# 其它函数:read.fastq可下载、读取Fastq;read.GenBank;read.gff;as.DNAbin;dist.dna;clustal/muscle(多序列比对)

# dnds dN/dS Ratio
# dnds(ex.dna, code = 1, codonstart = 1, quiet = FALSE, details = FALSE, return.categories = FALSE)
# Error: sequences are not unique

# data(woodmouse) # Woodmice的细胞色素b的基因序列
# res <- dnds(woodmouse, quiet = TRUE) # NOT correct

# 多序列比对图
# image.DNAbin Plot of DNA Sequence Alignement
# image(woodmouse)
# rug(seg.sites(woodmouse), -0.02, 3, 1)
# image(woodmouse, "n", "blue") # show missing data
# image(woodmouse, c("g", "c"), "green") # G+C
# par(mfcol = c(2, 2))
# ### barcoding style:
# for (x in c("a", "g", "c", "t"))
# image(woodmouse, x, "black", cex.lab = 0.5, cex.axis = 0.7)
# par(mfcol = c(1, 1))
# ### zoom on a portion of the data:
# image(woodmouse[11:15, 1:50], c("a", "n"), c("blue", "grey"))
# grid(50, 5, col = "black")
# ### see the guanines on a black background:
# image(woodmouse, "g", "yellow", "black")

# 多序列比对图 注释到 进化树上
# phydataplot(x, phy, style = "bars", offset = 1, scaling = 1, continuous = FALSE, width = NULL, legend = "below",
# funcol = rainbow, ...)
# ring(x, phy, style = "ring", offset = 1, ...)
# If you want to plot a DNA alignment in the same way than image.DNAbin, try style = "mosaic".
## plot an alignment with a NJ tree:
data(woodmouse)
trw <- nj(dist.dna(woodmouse))
plot(trw, x.lim = 0.1, align.tip = TRUE, font = 1)
phydataplot(woodmouse[, 1:100], trw, "m", 0.02, border = NA)

R包神器 | 系统发育和进化分析 - ape (一)_开发语言_10

# (序列)标签管理:character、DNAbin、phylo、multiPhylo
# makeLabel(x, ...)
# ## S3 method for class 'character'
# makeLabel(x, len = 99, space = "_", make.unique = TRUE,
# illegal = "():;,[]", quote = FALSE, ...)
# ## S3 method for class 'phylo'
# makeLabel(x, tips = TRUE, nodes = TRUE, ...)
# ## S3 method for class 'multiPhylo'
# makeLabel(x, tips = TRUE, nodes = TRUE, ...)
# ## S3 method for class 'DNAbin'
# makeLabel(x, ...)

# updateLabel - Update Labels
# where - Find Patterns in DNA Sequences

# nexus2DNAbin - Convert the output from the previous function into the class "DNAbin".

# write.nexus.data - Write Character Data in NEXUS Format

# 获得 class ‘phylo’
# tree1 <- read.tree(‘tree1.txt’)
# This stores the phylogenetic tree is in an object named tree1 of class ‘phylo’. 

# write.tree Write Tree File in Parenthetic Format
trw <- bionj(dist.dna(ex.dna, model = "raw"))
plot(trw)

R包神器 | 系统发育和进化分析 - ape (一)_kylin_11

str(trw)
## List of 4
##  $ edge       : int [1:41, 1:2] 23 25 26 37 37 26 25 27 28 32 ...
##  $ edge.length: num [1:41] 6.78e-05 1.76e-04 1.95e-02 7.50e-01 1.42e-02 ...
##  $ tip.label  : chr [1:22] "ERR2245277" "ERR2245279" "ERR2245288" "ERR2245293" ...
##  $ Nnode      : int 20
##  - attr(*, "class")= chr "phylo"
##  - attr(*, "order")= chr "cladewise"
write.tree(trw, file = "./tree.txt", append = FALSE, digits = 10, tree.names = FALSE)
# phy - an object of class "phylo" or "multiPhylo".

# mcmc.popsize函数 - 可逆跳跃MCMC (马尔可夫链蒙特卡罗) 推断群体数量史 (Demographic History)
# get tree
data("hivtree.newick") # example tree in NH format
tree.hiv <- read.tree(text = hivtree.newick) # load tree
# run mcmc chain
mcmc.out <- mcmc.popsize(tree.hiv, nstep=100, thinning=1, burn.in=0,progress.bar=FALSE) # toy run
#mcmc.out <- mcmc.popsize(tree.hiv, nstep=10000, thinning=5, burn.in=500) # remove comments!!
# make list of population size versus time
popsize <- extract.popsize(mcmc.out)
# plot and compare with skyline plot
sk <- skyline(tree.hiv)
plot(sk, lwd=1, lty=3, show.years=TRUE, subst.rate=0.0023, present.year = 1997)
lines(popsize, show.years=TRUE, subst.rate=0.0023, present.year = 1997)

R包神器 | 系统发育和进化分析 - ape (一)_kylin_12

# skyline函数 - 使用天际线图估计有效种群大小(Effective Population Size)
# get tree
data("hivtree.newick") # example tree in NH format
tree.hiv <- read.tree(text = hivtree.newick) # load tree
# corresponding coalescent intervals
ci <- coalescent.intervals(tree.hiv) # from tree
# collapsed intervals
cl1 <- collapsed.intervals(ci,0)
cl2 <- collapsed.intervals(ci,0.0119)
#### classic skyline plot ####
sk1 <- skyline(cl1) # from collapsed intervals
sk1 <- skyline(ci) # from coalescent intervals
sk1 <- skyline(tree.hiv) # from tree
sk1
## $time
##   [1] 0.021161 0.049822 0.050058 0.054444 0.057580 0.057620 0.058582 0.059336
##   [9] 0.059726 0.061847 0.063012 0.065150 0.065542 0.067767 0.068246 0.068285
##  [17] 0.069341 0.069631 0.069725 0.070130 0.070506 0.071398 0.077168 0.077774
##  [25] 0.078085 0.078439 0.078488 0.079791 0.079902 0.080093 0.080206 0.080907
##  [33] 0.080919 0.080923 0.081266 0.081732 0.082433 0.082434 0.082720 0.083124
##  [41] 0.083172 0.083436 0.083818 0.084541 0.084566 0.084629 0.085711 0.086004
##  [49] 0.087462 0.087519 0.087787 0.088609 0.088644 0.088785 0.088823 0.089216
##  [57] 0.089479 0.089540 0.090237 0.090248 0.090579 0.090777 0.091414 0.091575
##  [65] 0.092297 0.092865 0.092881 0.093773 0.094006 0.094130 0.094174 0.094384
##  [73] 0.094521 0.094683 0.094749 0.095068 0.095107 0.095333 0.095334 0.095369
##  [81] 0.096163 0.096270 0.096432 0.096562 0.097150 0.097278 0.097341 0.097342
##  [89] 0.099386 0.099467 0.099582 0.099726 0.099948 0.101084 0.101878 0.101892
##  [97] 0.101961 0.101962 0.102399 0.102590 0.103060 0.103714 0.103878 0.103926
## [105] 0.104386 0.104484 0.104691 0.105017 0.105018 0.105020 0.105181 0.106054
## [113] 0.106195 0.107063 0.107287 0.107629 0.108329 0.108551 0.108702 0.109333
## [121] 0.109996 0.111546 0.111869 0.112580 0.112666 0.113317 0.113663 0.113664
## [129] 0.113731 0.113986 0.114132 0.114392 0.114561 0.114632 0.114774 0.116003
## [137] 0.116109 0.117194 0.117310 0.117514 0.117728 0.118538 0.118650 0.118782
## [145] 0.118990 0.119083 0.120265 0.120408 0.120491 0.120494 0.122054 0.122422
## [153] 0.122572 0.123109 0.124283 0.125240 0.127902 0.127903 0.127904 0.127999
## [161] 0.129007 0.130366 0.130367 0.130470 0.131217 0.131462 0.132291 0.135095
## [169] 0.135794 0.135795 0.135914 0.140307 0.143055 0.144389 0.146425 0.148331
## [177] 0.150189 0.151362 0.155704 0.164770 0.166301 0.166846 0.176701 0.178108
## [185] 0.179167 0.181334 0.192334 0.196999 0.197000 0.204899 0.204900 0.209112
## 
## $interval.length
##   [1] 0.021161 0.028661 0.000236 0.004386 0.003136 0.000040 0.000962 0.000754
##   [9] 0.000390 0.002121 0.001165 0.002138 0.000392 0.002225 0.000479 0.000039
##  [17] 0.001056 0.000290 0.000094 0.000405 0.000376 0.000892 0.005770 0.000606
##  [25] 0.000311 0.000354 0.000049 0.001303 0.000111 0.000191 0.000113 0.000701
##  [33] 0.000012 0.000004 0.000343 0.000466 0.000701 0.000001 0.000286 0.000404
##  [41] 0.000048 0.000264 0.000382 0.000723 0.000025 0.000063 0.001082 0.000293
##  [49] 0.001458 0.000057 0.000268 0.000822 0.000035 0.000141 0.000038 0.000393
##  [57] 0.000263 0.000061 0.000697 0.000011 0.000331 0.000198 0.000637 0.000161
##  [65] 0.000722 0.000568 0.000016 0.000892 0.000233 0.000124 0.000044 0.000210
##  [73] 0.000137 0.000162 0.000066 0.000319 0.000039 0.000226 0.000001 0.000035
##  [81] 0.000794 0.000107 0.000162 0.000130 0.000588 0.000128 0.000063 0.000001
##  [89] 0.002044 0.000081 0.000115 0.000144 0.000222 0.001136 0.000794 0.000014
##  [97] 0.000069 0.000001 0.000437 0.000191 0.000470 0.000654 0.000164 0.000048
## [105] 0.000460 0.000098 0.000207 0.000326 0.000001 0.000002 0.000161 0.000873
## [113] 0.000141 0.000868 0.000224 0.000342 0.000700 0.000222 0.000151 0.000631
## [121] 0.000663 0.001550 0.000323 0.000711 0.000086 0.000651 0.000346 0.000001
## [129] 0.000067 0.000255 0.000146 0.000260 0.000169 0.000071 0.000142 0.001229
## [137] 0.000106 0.001085 0.000116 0.000204 0.000214 0.000810 0.000112 0.000132
## [145] 0.000208 0.000093 0.001182 0.000143 0.000083 0.000003 0.001560 0.000368
## [153] 0.000150 0.000537 0.001174 0.000957 0.002662 0.000001 0.000001 0.000095
## [161] 0.001008 0.001359 0.000001 0.000103 0.000747 0.000245 0.000829 0.002804
## [169] 0.000699 0.000001 0.000119 0.004393 0.002748 0.001334 0.002036 0.001906
## [177] 0.001858 0.001173 0.004342 0.009066 0.001531 0.000545 0.009855 0.001407
## [185] 0.001059 0.002167 0.011000 0.004665 0.000001 0.007899 0.000001 0.004212
## 
## $population.size
##   [1] 392.071008 525.528096   4.282220  78.750630  55.714176   0.703120
##   [7]  16.730142  12.972570   6.637800  35.709156  19.400745  35.214998
##  [13]   6.385680  35.844750   7.630949   0.614367  16.448256   4.466000
##  [19]   1.431150   6.095655   5.594128  13.117752  83.866950   8.705190
##  [25]   4.414956   4.965912   0.679189  17.844585   1.501830   2.552906
##  [31]   1.491939   9.141741   0.154560   0.050880   4.308423   5.779798
##  [37]   8.584446   0.012090   3.413410   4.759524   0.558144   3.029664
##  [43]   4.326150   8.079525   0.275650   0.685314  11.610942   3.101405
##  [49]  15.221520   0.586872   2.721004   8.229042   0.345450   1.371930
##  [55]   0.364458   3.715029   2.450108   0.559980   6.304365   0.098021
##  [61]   2.905518   1.711908   5.424055   1.349985   5.960832   4.616704
##  [67]   0.128016   7.024500   1.805750   0.945624   0.330132   1.550010
##  [73]   0.994620   1.156680   0.463386   2.202057   0.264654   1.507420
##  [79]   0.006555   0.225435   5.024432   0.665112   0.989010   0.779350
##  [85]   3.460968   0.739584   0.357273   0.005565  11.160240   0.433836
##  [91]   0.604095   0.741744   1.121100   5.623200   3.851694   0.066542
##  [97]   0.321264   0.004560   1.951205   0.834861   2.010660   2.737644
## [103]   0.671580   0.192240   1.801360   0.375144   0.774387   1.191530
## [109]   0.003570   0.006972   0.547883   2.899233   0.456840   2.742880
## [115]   0.690144   1.027026   2.048200   0.632700   0.419025   1.704331
## [121]   1.742364   3.961800   0.802655   1.717065   0.201756   1.482978
## [127]   0.765006   0.002145   0.139360   0.514080   0.285138   0.491660
## [133]   0.309270   0.125670   0.242962   2.031537   0.169176   1.670900
## [139]   0.172260   0.291924   0.294892   1.074060   0.142800   0.161700
## [145]   0.244608   0.104904   1.277742   0.148005   0.082170   0.002838
## [151]   1.408680   0.316848   0.123000   0.418860   0.869934   0.672771
## [157]   1.772892   0.000630   0.000595   0.053295   0.532224   0.674064
## [163]   0.000465   0.044805   0.303282   0.092610   0.290979   0.911300
## [169]   0.209700   0.000276   0.030107   1.014783   0.577080   0.253460
## [175]   0.348156   0.291618   0.252688   0.140760   0.455910   0.825006
## [181]   0.119418   0.035970   0.542025   0.063315   0.038124   0.060676
## [187]   0.231000   0.069975   0.000010   0.047394   0.000003   0.004212
## 
## $parameter.count
## [1] 192
## 
## $epsilon
## [1] 0
## 
## $logL
## [1] 1408.966
## 
## $logL.AICc
## [1] NA
## 
## attr(,"class")
## [1] "skyline"
plot(skyline(tree.hiv))
skylineplot(tree.hiv) # shortcut

R包神器 | 系统发育和进化分析 - ape (一)_kylin_13

plot(sk1, show.years=TRUE, subst.rate=0.0023, present.year = 1997)

R包神器 | 系统发育和进化分析 - ape (一)_kylin_14

#### generalized skyline plot ####
sk2 <- skyline(cl2) # from collapsed intervals
sk2 <- skyline(ci, 0.0119) # from coalescent intervals
sk2 <- skyline(tree.hiv, 0.0119) # from tree
sk2
## $time
##  [1] 0.021161 0.049822 0.061847 0.077168 0.089216 0.101878 0.113986 0.127902
##  [9] 0.140307 0.155704 0.176701 0.192334 0.209112
## 
## $interval.length
##  [1] 0.021161 0.028661 0.012025 0.015321 0.012048 0.012662 0.012108 0.013916
##  [9] 0.012405 0.015397 0.020997 0.015633 0.016778
## 
## $population.size
##  [1] 392.07100800 525.52809600  26.43747675  18.16241385   4.32071145
##  [6]   2.19342354   1.06974257   0.55211856   0.27727433   0.33138171
## [11]   0.38060475   0.09827875   0.02431880
## 
## $parameter.count
## [1] 13
## 
## $epsilon
## [1] 0.0119
## 
## $logL
## [1] 1239.478
## 
## $logL.AICc
## [1] 1225.456
## 
## attr(,"class")
## [1] "skyline"
plot(sk2)

R包神器 | 系统发育和进化分析 - ape (一)_开发语言_15

# classic and generalized skyline plot together in one plot
plot(sk1, show.years=TRUE, subst.rate=0.0023, present.year = 1997, col=c(grey(.8),1))
lines(sk2, show.years=TRUE, subst.rate=0.0023, present.year = 1997)
legend(.15,500, c("classic", "generalized"), col=c(grey(.8),1),lty=1)

R包神器 | 系统发育和进化分析 - ape (一)_开发语言_16

# find optimal epsilon parameter using AICc criterion
find.skyline.epsilon(ci)
## Searching for the optimal epsilon... epsilon = 0.01191938
## [1] 0.01191938
sk3 <- skyline(ci, -1) # negative epsilon also triggers estimation of epsilon
## Searching for the optimal epsilon... epsilon = 0.01191938
sk3$epsilon
## [1] 0.01191938
# skylineplot函数 - 绘制天际线图
# get tree
data("hivtree.newick") # example tree in NH format
tree.hiv <- read.tree(text = hivtree.newick) # load tree
#### classic skyline plot
skylineplot(tree.hiv) # shortcut

R包神器 | 系统发育和进化分析 - ape (一)_kylin_17

#### plot classic and generalized skyline plots and estimate epsilon
sk.opt <- skylineplot.deluxe(tree.hiv)
## Searching for the optimal epsilon... epsilon = 0.01191938

R包神器 | 系统发育和进化分析 - ape (一)_r语言_18

sk.opt$epsilon
## [1] 0.01191938
#### classic and generalized skyline plot ####
sk1 <- skyline(tree.hiv)
sk2 <- skyline(tree.hiv, 0.0119)
# use years rather than substitutions as unit for the time axis
plot(sk1, show.years=TRUE, subst.rate=0.0023, present.year = 1997, col=c(grey(.8),1))
lines(sk2, show.years=TRUE, subst.rate=0.0023, present.year = 1997)
legend(.15,500, c("classic", "generalized"), col=c(grey(.8),1),lty=1)

R包神器 | 系统发育和进化分析 - ape (一)_sed_19

#### various skyline plots for different epsilons
layout(mat= matrix(1:6,2,3,byrow=TRUE))
ci <- coalescent.intervals(tree.hiv)
plot(skyline(ci, 0.0));title(main="0.0")
plot(skyline(ci, 0.007));title(main="0.007")
plot(skyline(ci, 0.0119),col=4);title(main="0.0119")
plot(skyline(ci, 0.02));title(main="0.02")
plot(skyline(ci, 0.05));title(main="0.05")
plot(skyline(ci, 0.1));title(main="0.1")

R包神器 | 系统发育和进化分析 - ape (一)_r语言_20

layout(mat= matrix(1:1,1,1,byrow=TRUE))

# 其它有关R包:Felsenstein, J. (1993) Phylip (Phylogeny Inference Package) version 3.5c. http://evolution.genetics.washington.edu/phylip/phylip.html

ape包的dist.dna函数:计算DNA序列成对距离 (Pairwise Distances from DNA Sequences)

输入含DNA序列的矩阵或列表(List),类(Class)必须是DNAbin。若DNA序列被存储为字符(Character),可使用as.DNAbin函数来转换

dna.bin.dist = ape::dist.dna(ex.dna, model = "N", variance = FALSE, gamma = FALSE, pairwise.deletion = FALSE, base.freq = NULL, as.matrix = F)
dna.bin.dist.sub = ape::dist.dna(ex.dna.sub, model = "N", variance = FALSE, gamma = FALSE, pairwise.deletion = FALSE, base.freq = NULL, as.matrix = F)
dna.bin.dist.f2d = ape::dist.dna(ex.dna.f2d, model = "N", variance = FALSE, gamma = FALSE, pairwise.deletion = FALSE, base.freq = NULL, as.matrix = F)
# as.matrix:对角线或方矩阵
# model:"raw", "N", "TS", "TV", "JC69", "K80" (the default), "F81", "K81", "F84", "BH87", "T92", "TN93", "GG95", "logdet", "paralin", "indel", or "indelblock"
# raw, N: 每对序列之间差异位点的比例,或数量。有助于绘制“饱和图(Saturation plots)”。选项variance(方差)和gamma对其没有影响,但pairwise.deletion可以
# 故model为"N"时,返回整数 (SNP距离绝对值)

str(dna.bin.dist) # 'dist'
##  'dist' num [1:231] 15 80 90 38 16 67 27 66 41 15 ...
##  - attr(*, "Size")= int 22
##  - attr(*, "Labels")= chr [1:22] "ERR2245277" "ERR2245279" "ERR2245288" "ERR2245293" ...
##  - attr(*, "Upper")= logi FALSE
##  - attr(*, "Diag")= logi FALSE
##  - attr(*, "call")= language ape::dist.dna(x = ex.dna, model = "N", variance = FALSE, gamma = FALSE,      pairwise.deletion = FALSE, base.freq| __truncated__
##  - attr(*, "method")= chr "N"
dna.bin.dist # 是否为汉明距离 (序列之间核苷酸差异的数量)?
##            ERR2245277 ERR2245279 ERR2245288 ERR2245293 ERR2245294 ERR2245295
## ERR2245279         15                                                       
## ERR2245288         80         65                                            
## ERR2245293         90         75         42                                 
## ERR2245294         38         23         66         76                      
## ERR2245295         16          1         66         76         24           
## ERR2245344         67         52         77         87         53         53
## ERR2245378         27         12         71         81         29         13
## ERR2245380         66         51         62         72         52         52
## ERR2245386         41         26         69         79         27         27
## ERR2245388         15          0         65         75         23          1
## ERR2245393         82         67         26         44         68         68
## ERR2245400         28         13         72         82         30         14
## ERR2245401         58         43         54         64         44         44
## ERR2245402         47         32         75         85         33         33
## ERR2245406         59         44         55         65         45         45
## ERR2245411         63         48         73         83         49         49
## ERR2245418         62         47         72         82         48         48
## ERR2245420         58         43         54         64         44         44
## ERR2245425         62         47         72         82         48         48
## ERR2512377         48         33         94        104         56         34
## Reference         772        785        784        794        786        786
##            ERR2245344 ERR2245378 ERR2245380 ERR2245386 ERR2245388 ERR2245393
## ERR2245279                                                                  
## ERR2245288                                                                  
## ERR2245293                                                                  
## ERR2245294                                                                  
## ERR2245295                                                                  
## ERR2245344                                                                  
## ERR2245378         58                                                       
## ERR2245380         63         57                                            
## ERR2245386         56         32         55                                 
## ERR2245388         52         12         51         26                      
## ERR2245393         79         73         64         71         67           
## ERR2245400         59          3         58         33         13         74
## ERR2245401         55         49         14         47         43         56
## ERR2245402         62         38         61         36         32         77
## ERR2245406         56         50         15         48         44         57
## ERR2245411         22         54         59         52         48         75
## ERR2245418         21         53         58         51         47         74
## ERR2245420         55         49         14         47         43         56
## ERR2245425         21         53         58         51         47         74
## ERR2512377         85         45         82         59         33         96
## Reference         793        791        782        789        785        786
##            ERR2245400 ERR2245401 ERR2245402 ERR2245406 ERR2245411 ERR2245418
## ERR2245279                                                                  
## ERR2245288                                                                  
## ERR2245293                                                                  
## ERR2245294                                                                  
## ERR2245295                                                                  
## ERR2245344                                                                  
## ERR2245378                                                                  
## ERR2245380                                                                  
## ERR2245386                                                                  
## ERR2245388                                                                  
## ERR2245393                                                                  
## ERR2245400                                                                  
## ERR2245401         50                                                       
## ERR2245402         39         53                                            
## ERR2245406         51          1         54                                 
## ERR2245411         55         51         58         52                      
## ERR2245418         54         50         57         51          1           
## ERR2245420         50          0         53          1         51         50
## ERR2245425         54         50         57         51          1          0
## ERR2512377         46         74         65         75         81         80
## Reference         792        774        795        773        793        792
##            ERR2245420 ERR2245425 ERR2512377
## ERR2245279                                 
## ERR2245288                                 
## ERR2245293                                 
## ERR2245294                                 
## ERR2245295                                 
## ERR2245344                                 
## ERR2245378                                 
## ERR2245380                                 
## ERR2245386                                 
## ERR2245388                                 
## ERR2245393                                 
## ERR2245400                                 
## ERR2245401                                 
## ERR2245402                                 
## ERR2245406                                 
## ERR2245411                                 
## ERR2245418                                 
## ERR2245420                                 
## ERR2245425         50                      
## ERR2512377         74         80           
## Reference         774        792        752
# δ plot
delta.plot(dna.bin.dist, k = 20, plot = TRUE, which = 1:2)

R包神器 | 系统发育和进化分析 - ape (一)_开发语言_21

# Holland, B. R., Huber, K. T., Dress, A. and Moulton, V. (2002) Delta plots: a tool for analyzing phylogenetic distance data. Molecular Biology and Evolution, 12, 2051–2059.
ex.dna=ex.dna.sub

# Tree Estimation Based on an Improved Version of the NJ Algorithm, Gascuel (1997)
# trw <- bionj(dist.dna(ex.dna.sub))
# trw <- bionj(dist.dna(ex.dna)) # 默认参数时 报错,可能是因为存在距离较远的Taxa或毒株
# trw <- bionj(dist.dna(ex.dna, model = "N")) # Error - at least one distance was greater than 100
trw <- bionj(dist.dna(ex.dna, model = "raw"))
plot(trw)

R包神器 | 系统发育和进化分析 - ape (一)_开发语言_22

# f <- function(x) nj(dist.dna(x, model = "raw"))
# tr <- f(ex.dna.sub)
# ### Are bootstrap values stable?
# for (i in 1:5)
# print(boot.phylo(tr, ex.dna.sub, f, quiet = TRUE))
# ### How many partitions in 100 random trees of 10 labels?...
# TR <- rmtree(100, 10)
# pp10 <- prop.part(TR)
# length(pp10)
# ### ... and in 100 random trees of 20 labels?
# TR <- rmtree(100, 20)
# pp20 <- prop.part(TR)
# length(pp20)
# plot(pp10, pch = "x", col = 2)
# plot(pp20, pch = "x", col = 2)

fun <- function(x) as.phylo(hclust(dist.dna(x, model="raw"), "average")) # upgma() in phangorn
tree <- fun(ex.dna.sub)
## get 100 bootstrap trees:
bstrees <- boot.phylo(tree, ex.dna.sub, fun, trees = TRUE)$trees
## 
Running bootstraps:       100 / 100
## Calculating bootstrap values... done.
## get proportions of each clade:
clad <- prop.clades(tree, bstrees, rooted = TRUE)
## get proportions of each bipartition:
boot <- prop.clades(tree, bstrees)
layout(1)
par(mar = rep(2, 4))
plot(tree, main = "Bipartition vs. Clade Support Values")
drawSupportOnEdges(boot)
nodelabels(clad)
legend("bottomleft", legend = c("Bipartitions", "Clades"), pch = 22,
pt.bg = c("green", "lightblue"), pt.cex = 2.5)

R包神器 | 系统发育和进化分析 - ape (一)_r语言_23

d <- dist.dna(ex.dna.sub, model="N")
delta.plot(d)

R包神器 | 系统发育和进化分析 - ape (一)_kylin_24

layout(1)
delta.plot(d, 40, which = 1)

R包神器 | 系统发育和进化分析 - ape (一)_r语言_25

# Tree Estimation Based on the Minimum Evolution Algorithm, Desper and Gascuel (2002).
trw <- fastme.bal(dist.dna(ex.dna, model="N"))
plot(trw)

R包神器 | 系统发育和进化分析 - ape (一)_kylin_26

rt <- dist.dna(ex.dna.sub, model="raw", variance = TRUE)
v <- attr(rt, "variance")
tr <- mvr(rt, v) # Phylogenetic tree construction based on the minimum variance reduction.
plot(tr, "u")
## Warning in plot.phylo(tr, "u"): 39 branch length(s) NA(s): branch lengths
## ignored in the plot

R包神器 | 系统发育和进化分析 - ape (一)_sed_27

# Neighbor-Joining Tree Estimation, Saitou and Nei (1987).
trw <- nj(dist.dna(ex.dna, model="N"))
plot(trw)

R包神器 | 系统发育和进化分析 - ape (一)_sed_28

## NJ树+多序列比对 - plot an alignment with a NJ tree:
trw <- nj(dist.dna(ex.dna.sub, model="raw"))
# 绘制带虚线、且右端对齐的树
plot(trw, x.lim = 0.1, align.tip = TRUE, font = 1)
# Tree Annotation
phydataplot(ex.dna[, 1:50], trw, "m", 0.05, border = NA)

R包神器 | 系统发育和进化分析 - ape (一)_kylin_29

## from ?boot.phylo:
f <- function(x) nj(dist.dna(x, model="raw"))
tw <- f(ex.dna) # NJ tree with K80 distance
set.seed(1)
## bootstrap with 100 replications:
(bp <- boot.phylo(tw, ex.dna, f, quiet = TRUE))
##  [1]  NA  23  44  88  90 100  89  23 100 100 100 100  95  26 100 100  29 100 100
## the first value relates to the root node and is always 100
## it is ignored below:
plot(tw, "u") # 辐射状的树
drawSupportOnEdges(bp)

R包神器 | 系统发育和进化分析 - ape (一)_开发语言_30

## more readable but the tree is really unrooted:
plot(tw)
drawSupportOnEdges(bp)

R包神器 | 系统发育和进化分析 - ape (一)_sed_31

op <- par(mfcol = 1:2)
par(op)

# tr <- rtree(10)
tr=tw
# tr$edge.length[c(1, 18)] <- 100
op <- par(mfcol = 1:2)
plot(tr); axisPhylo()
# 绘制长枝打断的树
plotBreakLongEdges(tr, 2); axisPhylo()

R包神器 | 系统发育和进化分析 - ape (一)_ci_32

# tr <- triangMtd(dist.dna(ex.dna.sub, model="raw"))
# plot(tr)