Hu Z, Olatoye MO, Marla S, Morris GP
Mining crop genomic variation can facilitate the genetic research of complex traits and molecular breeding. In sorghum [ L. (Moench)], several large-scale single nucleotide polymorphism (SNP) datasets have been generated using genotyping-by-sequencing of KI reduced representation libraries. However, data reuse has been impeded by differences in reference genome coordinates among datasets. To facilitate reuse of these data, we constructed and characterized an integrated 459,304-SNP dataset for 10,323 sorghum genotypes on the version 3.1 reference genome. The SNP distribution showed high enrichment in subtelomeric chromosome arms and in genic regions (48% of SNPs) and was highly correlated ( = 0.82) to the distribution of KI restriction sites. The genetic structure reflected population differences by botanical race, as well as familial structure among recombinant inbred lines (RILs). Faster linkage disequilibrium decay was observed in the diversity panel than in the RILs, as expected, given the greater opportunity for recombination in diverse populations. To validate the quality and utility of the integrated SNP dataset, we used genome-wide association studies (GWAS) of genebank phenotype data, precisely mapping several known genes (e.g and ) and identifying novel associations for other traits. We further validated the dataset with GWAS of new and published plant height and flowering time data in a nested association mapping population, precisely mapping known genes and identifying epistatic interactions underlying both traits. These findings validate this integrated SNP dataset as a useful genomics resource for sorghum genetics and breeding.