Simultaneous Parameter Learning and Bi-clustering for Multi-Response Models.

Yu M, Natesan Ramamurthy K, Thompson A, Lozano AC

Published: 14 August 2019 in Frontiers in big data
Keywords: bi-clustering, convex clustering, genome-wide association studies, high-throughput phenotyping, multitask learning, sparse linear regression
Pubmed ID: 33693350
DOI: 10.3389/fdata.2019.00027

We consider multi-response and multi-task regression models, where the parameter matrix to be estimated is expected to have an unknown grouping structure. The groupings can be along tasks, or features, or both, the last one indicating a bi-cluster or "checkerboard" structure. Discovering this grouping structure along with parameter inference makes sense in several applications, such as multi-response Genome-Wide Association Studies (GWAS). By inferring this additional structure we can obtain valuable information on the underlying data mechanisms (e.g., relationships among genotypes and phenotypes in GWAS). In this paper, we propose two formulations to simultaneously learn the parameter matrix and its group structures, based on convex regularization penalties. We present optimization approaches to solve the resulting problems and provide numerical convergence guarantees. Extensive experiments demonstrate much better clustering quality compared to other methods, and our approaches are also validated on real datasets concerning phenotypes and genotypes of plant varieties.