本文内容整理自科迪华农业科学公司(Corteva Agriscience)的数量遗传学家Alencar Xavier博士几年前做的报告。Alencar Xavier在统计遗传学方面的工作是基因组辅助育种,重点是数据驱动的植物育种的理论和计算方面,例如使用各种信息来源进行建模、预测和选择。其研究涉及使用混合模型,贝叶斯方法和机器学习以及高性能计算的新数量遗传方法的开发和实施。其更多的介绍和工作可参考:

而Corteva于几年前由杜邦先锋、陶氏益农并购重组后,已然成为全球top2的巨型种企,做育种的不用做过多介绍。看看一线大厂的科学家怎么做育种的吧。

引言:在植物育种中利用基因组信息的机遇与挑战

数据的激增与测序成本的下降。

科迪华数据科学家对基因组信息应用于植物育种的观点与建议_建模

育种流程

科迪华数据科学家对基因组信息应用于植物育种的观点与建议_sed_02

遗传优势建模

动物GS中的单步法建模与植物GS中的单阶段建模
image.png

应用

为避免翻译错误,这里放原文为好。

科迪华数据科学家对基因组信息应用于植物育种的观点与建议_建模_03

Germplasm classification (PCA, Clustering, Unsupervised ML, FST)

Characterization

Characterize diversity using unsupervised learning methods.

Heterotic group

Classify (if known) or infer (if unknown) heterotic groups on individuals and populations.

Signatures of selection

Use FST (or related methods) to identify signatures of selection, adaptation and domestication.

Incorporation (GWAS, haplotype analysis)

Trait discovery

Finding new QTLs via association analysis on breeding data and designed populations.

Introduction of diversity

Screening non-elite (or elite from elsewhere) germplasm for pre-breeding.

Haplotype enrichment

Assess genome of non-elite material to add diversity to regions where elite germplasm is fixed.

Genomic selection (BayesABC, Supervised ML, etc.)

F2 enrichment (WF)

Entire population is genotyped with few markers and selected for specific QTL (e.g. disease resistance)

Pre-selection (WF/AF)

Entire population is genotyped and 0% is phenotyped. Selection is based on the genomic merit
estimated a predefined estimation set that is either made by design or using breeding data.

Test-and-shelf (WF/AF)

Entire population is genotyped and X% is phenotyped. Within-season selection is based on the
genomic merit estimated with a genomic model from phenotyped individuals.

Advancement (WF/AF)

Entire population is genotyped and phenotyped. Selection is based on the genetic merit of the
individuals using one or more seasons of data from those individuals.

Product placement (AF)

Similar to advancement but GxE takes the spotlight from G.

Recycling (Simulation and optimization)

Selection of parents

Selection of high BV individuals with complementary polygene or traits.

Select combinations

Providing a set of candidate parents (100% genotyped), combinations are based on clustering,
simulate crosses or predefined criterium (OHV or OPV).

Quantitative assessment (Variance component analysis)

Heritability

Narrow-sense and GxE (e.g. compound symmetry)

Genetic variance decomposition

Classic (Vg = Va + Vd + Vi) and hybrid (Vg = VGCA1 + VGCA2 + VSCA)

Genetic correlations

Across traits or within-trait across environments

Effective population size

Eigen analysis of the G matrix

Genetic progress and rate of genetic gains

Assess multiple years

Evaluate breeding strategies

Simulations and retrospective studies to ask what if questions

挑战

关键挑战

  • 通过“建模+群体设计+实验设计”来提高准确性。
  • 更好地利用 GxE,更好地了解 TPEs。
  • 在基因组模型中使用环境数据(土壤、天气、管理)。
  • 处理多亲本杂交。
  • 有效合作以及在不同项目中保持育种一致性。
  • 指导育种人员如何使用基因组数据。
  • 数据管理——轻松访问任何类型的数据和可视化工具。

反复调整育种设计

  • 确定每个育种阶段的重复与试点数量。
  • 从哪个育种阶段轮回选择亲本。
  • 在哪个阶段 “GxE”超过 “G”。
  • 提高遗传力和优化 GS 模型的策略。

对于育种家

  • 了解你的种质。
  • 了解你的目标环境。
  • 有清晰的育种目标。

总结

  • GS 在晋级(advancement)、轮回选择(recycling))和分析(incorporations))等方面的应用各不相同。
  • 试验设置和育种设计对 GS 起着关键作用。
  • 育种管线是动态的,需要不断改进。

优化育种程序的参考资料:

Rincent et al. (2012) Maximizing the reliability of genomic selection by optimizing the calibration set of reference individuals. Genetics, 192(2), 715-728.

Isidro et al. (2015). Training set optimization under population structure in genomic selection. TAG 128(1), 145-158.

Habier (2016). Improved molecular breeding methods. US20160321396A1.

Ou and Liao (2019). Training set determination for genomic selection. TAG 132(10), 2781-2792.

Brauner et al. (2019). Genomic prediction with multiple biparental families. TAG

作者:生物信息与育种


科迪华数据科学家对基因组信息应用于植物育种的观点与建议_数据_04