Sequential SNP Prioritization Algorithm (SSPA): A Correlation-Based Post-GWAS Framework for Trait-Associated SNP Discovery in Sorghum
The Sequential SNP Prioritization Algorithm (SSPA) refines GWAS results by leveraging correlation-based feature engineering to prioritize SNPs associated with complex traits in sorghum, providing valuable insights for functional genomics research and enabling downstream machine learning applications.
Keywords: plant phenotype; Sorghum bicolor; maximum canopy height; maximum growth rate; GWAS; SNP prioritization; feature engineering
This work advances genome-to-phenotype discovery by introducing the Sequential SNP Prioritization Algorithm (SSPA), a correlation-based feature engineering framework that extends the reach of GWAS beyond stringent significance thresholds. By integrating phenotypic similarity with genetic relatedness, SSPA effectively uncovers informative variants that would otherwise be overlooked, providing a scalable path toward more comprehensive identification of genotype–phenotype associations. – D. Pal
Genome-wide association studies (GWAS) are widely applied to uncover the genetic basis of complex traits, but their effectiveness is often constrained by stringent statistical thresholds that risk excluding informative variants. To address this limitation, researchers from Michigan State University and collaborating institutions developed the Sequential SNP Prioritization Algorithm (SSPA), a post-GWAS feature engineering approach that prioritizes SNPs and genes with potential phenotypic effects, on top of permissive-filtered GWAS thresholds. SSPA evaluates correlations between phenotypic similarity, derived from normalized trait measurements and genetic relatedness across accessions. This correlation-based strategy enables prioritization and ranking of SNPs that contribute to phenotypic variation but have weaker individual associations in GWAS, allowing the identification of informative variants that might otherwise be overlooked. They tested SSPA using phenotypic data for canopy height and growth rate across 274 Sorghum bicolor accessions cultivated at the Maricopa Agricultural Center, Arizona, and found moderately strong correlations (0.69–0.71) between phenotype and SNP similarity matrices. These findings highlight the potential of SSPA to complement GWAS by refining the pool of exploratory candidate SNPs for downstream analyses.
Application of SSPA across multiple datasets revealed both environment-specific and conserved genetic signals. For instance, overlaps in prioritized SNPs between Arizona and Ethiopian trials likely reflect shared arid climatic conditions, while divergence from South Carolina datasets emphasize environmental effects on phenotype–genotype associations. Several identified SNPs were located near genes with homologous evidence of growth regulation, including SORBI_3001G265600, a sorghum ortholog of a tobacco gene linked to plant height. Beyond its capacity to highlight biologically meaningful loci, SSPA reduces genomic dimensionality, providing a tractable input set for machine learning models. Although current limitations include linear correlation assumptions and limited phenotypic variation among accessions, SSPA demonstrates strong potential as a scalable prioritization tool. Future improvements may involve incorporating weighted correlation metrics, nonlinear feature selection, and explicit modeling of environmental influences to further enhance SNP discovery and trait prediction.
SorghumBase example:

Reference:
Pal D, Schaper K, Thompson A, Guo J, Jaiswal P, Lisle C, Cooper L, LeBauer D, Thessen AE, Ross A. Post-GWAS Prioritization of Genome–Phenome Association in Sorghum. Agronomy. 2024; 14(12):2894. https://doi.org/10.3390/agronomy14122894. Read more
Related Project Websites:
- Arun Ross’s lab (iPRoBe) at Michigan State University: https://iprobe.cse.msu.edu/index.php
- Translational and Integrative Sciences Lab at North Carolina University at Chapel Hill: https://tislab.org/
- Jaiswal Lab at Oregon State University: https://jaiswallab.cgrb.oregonstate.edu/
- Thompson Lab, Maize research at Michigan State University: https://www.thompsonmaizelab.org/
- Arizona Experimental Station at University of Arizona: https://datascience.cct.arizona.edu/projects
- Project Page: https://genophenoenvo.github.io/
- Project Github Repo: https://github.com/genophenoenvo


