Interpretable Deep Learning Models Reveal Regulatory Elements Predicting Gene Expression in Diverse Plant Species
Gene expression regulation is a highly complex process involving multiple factors, including interactions between transcription factors and cis-regulatory elements (CREs). Traditional molecular biology techniques present challenges in fully elucidating these mechanisms. Convolutional neural networks (CNNs) have the potential to systematically investigate sequence-to-regulation relationships. In this study, researchers from Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), the Institute of Bio- and Geosciences, IBG-4, Cluster of Excellence on Plant Sciences (CEPLAS), South Westphalia University of Applied Sciences, University of Göttingen and the Center of Integrated Breeding Research (CiBreed) developed interpretable deep learning models to predict high and low gene expression from gene flanking regions in four plant species: Arabidopsis thaliana, Solanum lycopersicum, Sorghum bicolor, and Zea mays. The single-species models achieved over 80% accuracy and were instrumental in identifying important regulatory elements from flanking regions that influence gene expression levels. Multi-species models showed impressive cross-species performance, successfully identifying conserved and species-specific regulatory sequence features. This deep learning approach allows for automated motif extraction from raw sequences, so the researchers demonstrated its applicability by revealing causal relationships between genetic variations and gene expression changes in tomato genomes. Furthermore, the models accurately predicted genotype-specific expression of key functional gene groups, highlighting known phenotypic and metabolic differences between a pair of domesticated and wild tomato species. This study underscores the potential of deep learning in exploring gene regulation and genetic variation, offering a powerful tool for functional genomics and phenotypic trait prediction.
SorghumBase examples:
Reference:
Peleke FF, Zumkeller SM, Gültas M, Schmitt A, Szymański J. Deep learning the cis-regulatory code for gene expression in selected model plants. Nat Commun. 2024 Apr 25;15(1):3488. PMID: 38664394. doi: 10.1038/s41467-024-47744-0. Read more
Related Project Websites:
- Szymański lab at Leibniz Institute of Plant Genetics and Crop Plant Research (IPK): https://www.ipk-gatersleben.de/en/research/molecular-genetics/network-analysis-and-modelling
- Armin Schmitt’s page at the University of Göttingen: https://publications.goettingen-research-online.de/cris/rp/rp93174