A novel transfer learning framework for sorghum biomass prediction using UAV-based remote sensing data and genetic markers.

Wang T, Crawford MM, Tuinstra MR

Published: 28 April 2023 in Frontiers in plant science
Keywords: LiDAR, biomass prediction, genetic markers, hyperspectral, long short-term memory, recurrent neural network, transfer learning
Pubmed ID: 37113602
DOI: 10.3389/fpls.2023.1138479

Yield for biofuel crops is measured in terms of biomass, so measurements throughout the growing season are crucial in breeding programs, yet traditionally time- and labor-consuming since they involve destructive sampling. Modern remote sensing platforms, such as unmanned aerial vehicles (UAVs), can carry multiple sensors and collect numerous phenotypic traits with efficient, non-invasive field surveys. However, modeling the complex relationships between the observed phenotypic traits and biomass remains a challenging task, as the ground reference data are very limited for each genotype in the breeding experiment. In this study, a Long Short-Term Memory (LSTM) based Recurrent Neural Network (RNN) model is proposed for sorghum biomass prediction. The architecture is designed to exploit the time series remote sensing and weather data, as well as static genotypic information. As a large number of features have been derived from the remote sensing data, feature importance analysis is conducted to identify and remove redundant features. A strategy to extract representative information from high-dimensional genetic markers is proposed. To enhance generalization and minimize the need for ground reference data, transfer learning strategies are proposed for selecting the most informative training samples from the target domain. Consequently, a pre-trained model can be refined with limited training samples. Field experiments were conducted over a sorghum breeding trial planted in multiple years with more than 600 testcross hybrids. The results show that the proposed LSTM-based RNN model can achieve high accuracies for single year prediction. Further, with the proposed transfer learning strategies, a pre-trained model can be refined with limited training samples from the target domain and predict biomass with an accuracy comparable to that from a trained-from-scratch model for both multiple experiments within a given year and across multiple years.