SorghumBase release 2 is out with 13 new sorghum genomes

SorghumBase Release 2.0
Released: November 15, 2021

SorghumBase (https://www.sorghumbase.org), a web portal for comparative plant genomics focused on sorghum crop varieties, has now released its second version. The release provides access to 13 new sorghum varieties (Tao et al, 2021), for a total of 18 sorghum reference genomes, and 6 other species to support phylogenetic analyses, and cross-species comparisons.

The new sorghum genomes include six varieties of Sorghum bicolor ssp. bicolor (IS12661, IS3614-3, IS38525, IS929, Ji2731, and R931945-2-2), two S. bicolor ssp. bicolor (margaritiferum) (IS19953 and PI525695), three S. bicolor ssp. verticilliflorum (AusTRCF317961, 353, and PI536008), one S. bicolor ssp. drummondii (PI532566), and S. propinquum S369-1. These 13 genomes, together with the original 5 S. bicolor ssp. bicolor genomes (BTx623, Rio, Tx2783, Tx430, and Tx436), and 6 plant outgroup species (Japonica rice, B73 maize assembly versions 4 and 5, Arabidopsis thaliana, grapevine, a vascular plant, and a single-celled green algae), were used to build 30,475 protein-coding gene family trees.

Gene expression and orthology-based pathway projections are available for for the S. bicolor BTx623 reference genome via the SorghumBase search interface. The Sorghumbase knowledgebase also includes a total of five pairwise DNA alignments, one for each of the original sorghum genomes aligned to Japonica rice. Additional pairwise DNA alignments between BTx623 and each of A. thaliana, barley, and grapevine; as well as a synteny map for BTx623 and Japonica rice, are available in the Gramene website.

We continue to host genetic variation data sets for over 6.7 million sorghum single nucleotide polymorphisms (SNPs) (Morris et al, 2013; Mace et al, 2013) and 1.5 million chemically induced by ethyl methanesulfonate (EMS) point mutations (Xin et al, 2008); 27,884 structural variants (Zheng et al, 2011) from the Database of Genomic Variation Archive (DGVa); and nearly 6,000 QTLs and GWAS from the Sorghum QTL Atlas.

The genome databases were built in direct collaboration with the Gramene and Ensembl Plants projects. Other data sets were facilitated via collaborations with the Expression Atlas, the Sorghum QTL Atlas, and the Plant Reactome databases. Core funding for the project is provided by the Agricultural Research Service of the U.S. Department of Agriculture (USDA ARS 8062-21000-041-00D) to the Ware Lab at the Cold Spring Harbor Laboratory.

We are grateful to the sorghum research community, and especially to Emma Mace, David Jordan, and Yongfu Tao for generously sharing their data and valuable contributions, and to Gloria Burow and Scott Sattler for providing excellent feedback on the site.

For additional details, please see the release notes at the SorghumBase website.