Sorghum bicolor RTx430

Sorghum bicolor RTx430 is a grain sorghum inbred commonly used as pollinator in hybrid production, whose genome is known to be rich in repeats.

 

Germplasm

Plant Introduction (PI) number for Sorghum bicolor (L.) Moench subsp. bicolor, ‘RTx430’ in the U.S. National Plant Germplasm System (GRIN – Global): PI 655996.

 

Image

Image source: The GRIN database, https://npgsweb.ars-grin.gov/gringlobal/ImgDisplay?id=1125783

 

Statistics (Source: NCBI, April 2021)

Sorghum line   Tx430
Assembly information
Assembly name Corteva_Sorghum_ONT_TX430_1.0
Assembly date n/a
Assembly accession n/a
WGS accession QWKM00000000
Assembly provider
Sequencing description Sequencing technologies: Oxford Nanopore MiniION
  Sequencing method
  Genome coverage: 90.0x
Assembly description Assembly methods: SMARTdenovo v. June-2017; CANU v. 1.6
  Construction of pseudomolecules
Finishing strategy
NCBI submission  

Submitted (28-AUG-2018)

Publication: Deschamps et al (2018)
 
Assembly statistics
Number of contigs 723
Total assembly length (Mb) 666
Contig N50 (Mb) 3
 
Annotations stats
Total number of genes 36,937
Total number of transcripts 49,928
Average gene length 3,252
Exons per transcript 4

 

Assembly

The Sorghum bicolor Tx430 genome was generated by combining Oxford Nanopore sequences generated on a MinION sequencer with Bionano Genomics Direct Label and Stain (DLS) optical maps, as described by Deschamps et al (Deschamps et al. 2018)(see also https://www.corteva.com). The final chromosome-scale de novo assembly consists of 29 scaffolds, encompassing mostly entire chromosome arms. It has a scaffold N50 of 33.28 Mbps and covers 90% of the expected genome length.

NCBI accession: https://www.ncbi.nlm.nih.gov/assembly/GCA_003482435.1

 

Annotation

Genome annotation was carried out as described by (Deschamps et al. 2018). First, genome repeats were masked using RepeatMasker and a curated sorghum specific repeats file from Repbase. The repeat-masked genome was used as input to two categories of gene predictors. De novo gene prediction programs Fgenesh (Solovyev et al, 1994), Augustus (Stanke and Waack, 2003), and SNAP (Bromberg and Rost, 2007) were run under default parameters and the training sets used were monocots, maize, and rice, respectively. The EST, cDNA, long-read evidence-based gene structure modelers GMAP and PASA, as well as the protein evidence-based gene structure modeler SPLAN were also run. Long read sequences of BTx623 line of sorghum from NCBI, along with other sorghum EST’s and cDNA were used as the evidence set to PASA. Other non-sorghum Poales EST, cDNA sequences from NCBI, and monocot transcripts from phytozome were used as additional closely related species evidence for gene prediction with GMAP. Uniref100 plant protein sequences were used as an evidence dataset for gene structure prediction using SPLAN. All gene annotation files were run through EvidenceModeler and the output used to polish the gene boundaries in PASA. The final PASA annotation file was combined with tRNA predictions file from tRNA-ScanSE to obtain the final structural annotation file, along with fasta sequences of protein, CDS, cDNA and gene. For additional details, see (Deschamps et al, 2018).

 

Literature References

Bromberg, Yana, and Burkhard Rost. 2007. “SNAP: Predict Effect of Non-Synonymous Polymorphisms on Function.” Nucleic Acids Research 35 (11): 3823–35. PMID: 17526529. https://doi.org/10.1093/nar/gkm238.

Casa, Alexandra M., Gael Pressoir, Patrick J. Brown, Sharon E. Mitchell, William L. Rooney, Mitchell R. Tuinstra, Cleve D. Franks, and Stephen Kresovich. 2008. “Community Resources and Strategies for Association Mapping in Sorghum.” Crop Science 48 (1): 30–40. https://doi.org/10.2135/cropsci2007.02.0080.

Deschamps, Stáphane, Yun Zhang, Victor Llaca, Liang Ye, Gregory May, and Haining Lin. 2018. “A Chromosome-Scale Assembly of the Sorghum Genome Using Nanopore Sequencing and Optical Mapping.” PMID: 30451840. https://doi.org/10.1101/327817.

Solovyev, V. V., A. A. Salamov, and C. B. Lawrence. 1994. “Predicting Internal Exons by Oligonucleotide Composition and Discriminant Analysis of Spliceable Open Reading Frames.” Nucleic Acids Research 22 (24): 5156–63. PMID: 7816600. https://doi.org/10.1093/nar/22.24.5156

Stanke, Mario, and Stephan Waack. 2003. “Gene Prediction with a Hidden Markov Model and a New Intron Submodel.” Bioinformatics  19 Suppl 2 (October): ii215–25. PMID: 14534192. https://doi.org/10.1093/bioinformatics/btg1080.