系谱检查常见错误,包括:


个体有重复值父母本交叉系谱有循环

这些情况应该如何快速检查呢?

这里推荐我写的R包​​learnasreml​​​中的​​check_pedigree​​函数,简单好用,结果友好。能够检查:


  • 个体是否有重复值
  • 父母本是否有交叉

至于系谱循环检查,推荐栾生老师写的​​visPedigree​​​包中的函数​​tidyped​​。下面介绍函数的用法。

1. 正常的系谱

这里使用​​asreml​​包中的harvey.ped数据:

> head(ped)
Calf Sire Dam
1 101 Sire_1 0
2 102 Sire_1 0
3 103 Sire_1 0
4 104 Sire_1 0
5 105 Sire_1 0
6 106 Sire_1 0
> learnasreml::check_pedigree(ped)
系谱共有行数: 65
个体共有个数: 65
父本共有个数: 9
母本共有个数: 0
个体没有重复!
父母本个体没有交叉!

可以看到,共有65行系谱,个体没有重复,父母本没有交叉。

使用visPeidgree包:

在这里插入代码片> pped = visPedigree::tidyped(ped = ped)
Warning message:
In checkped(ped_inter, addgen) :
In the sire or dam column, Blank, Zero, asterisk, or character NA is recognized as a missing parent and is replaced with the missing value NA.
> head(pped)
Ind Sire Dam Gen Sex IndNum SireNum DamNum
1: Sire_1 <NA> <NA> 1 male 1 0 0
2: Sire_2 <NA> <NA> 1 male 2 0 0
3: Sire_3 <NA> <NA> 1 male 3 0 0
4: Sire_4 <NA> <NA> 1 male 4 0 0
5: Sire_5 <NA> <NA> 1 male 5 0 0
6: Sire_6 <NA> <NA> 1 male 6 0 0

使用nadiv包检查系谱:

> pped = visPedigree::tidyped(ped = ped)
Warning message:
In checkped(ped_inter, addgen) :
In the sire or dam column, Blank, Zero, asterisk, or character NA is recognized as a missing parent and is replaced with the missing value NA.
> head(pped)
Ind Sire Dam Gen Sex IndNum SireNum DamNum
1: Sire_1 <NA> <NA> 1 male 1 0 0
2: Sire_2 <NA> <NA> 1 male 2 0 0
3: Sire_3 <NA> <NA> 1 male 3 0 0
4: Sire_4 <NA> <NA> 1 male 4 0 0
5: Sire_5 <NA> <NA> 1 male 5 0 0
6: Sire_6 <NA> <NA> 1 male 6 0 0

2. 个体重复的系谱

使用nadiv检查系谱:

> ped = data.frame(ID = c(1:10,5,8), Sire = paste0("A",1:12), Dam = paste0("B",1:12))
> ped
ID Sire Dam
1 1 A1 B1
2 2 A2 B2
3 3 A3 B3
4 4 A4 B4
5 5 A5 B5
6 6 A6 B6
7 7 A7 B7
8 8 A8 B8
9 9 A9 B9
10 10 A10 B10
11 5 A11 B11
12 8 A12 B12
> nadiv::prepPed(ped)
Error in nadiv::prepPed(ped) :
some individuals appear more than once in the pedigree

可以看到,报错,显示ID中有重复,但是没有说明哪些有错误。

visPedigree检查系谱:

> visPedigree::tidyped(ped)
Error in checkped(ped_inter, addgen) : Please check the pedigree!
In addition: Warning messages:
1: In checkped(ped_inter, addgen) :
The 2 duplicated individual IDs are found in the pedigree. Only the first 2 records are shown.
2: In checkped(ped_inter, addgen) : 5, A5, B5
3: In checkped(ped_inter, addgen) : 8, A8, B8

提示,系谱5和8有重复。

learnasreml包检查系谱:

> learnasreml::check_pedigree(ped)
系谱共有行数: 12
个体共有个数: 10
父本共有个数: 12
母本共有个数: 12
个体重复数为: 2 个,分别是: 5 8
父母本个体没有交叉!

这个结果最友好,中文的,显示个体重复编号是5,8。

提取这两个系谱:

> ped %>% filter(ID %in% c(5,8))
ID Sire Dam
1 5 A5 B5
2 8 A8 B8
3 5 A11 B11
4 8 A12 B12

可以看到,这里ID有重复,但是其父母本不一样,应该是系谱错误所致。

如何安装​​learnasreml​

#安装方法:
if (!requireNamespace("devtools")) install.packages("devtools")
library(devtools)
install_github("dengfei2013/learnasreml")