Mathematics and Statistics Seminar Series - Alejandra Avalos Pacheco (Vienna University of Technology): Multi-study Factor Regression Models for Heterogenous Data with Applications to Nutritional Epidemiology and Cancer Genomics | Dipartimento di Scienze Economiche

Mathematics and Statistics Seminar Series - Alejandra Avalos Pacheco (Vienna University of Technology): Multi-study Factor Regression Models for Heterogenous Data with Applications to Nutritional Epidemiology and Cancer Genomics

23 novembre 2022 12:30
Luogo: 
Caniana - Aula 14
Relatore/i: 
Alejandra Avalos Pacheco (Vienna University of Technology)
Seminari di dipartimento
Persona di riferimento: 
Sirio Legramanti sirio.legramanti@unibg.it
Strutture interne organizzatrici: 
Dipartimento di Scienze Economiche

Title: Multi-study Factor Regression Models for Heterogenous Data with Applications to Nutritional Epidemiology and Cancer Genomics

Abstract: Data-integration of multiple studies can be key to understand and gain knowledge in statistical research. However, such data present both biological and artifactual sources of variation, also known as covariate effects. Covariate effects can be complex, leading to systematic biases. In this talk I will present novel sparse latent factor regression (FR) and multi-study factor regression (MSFR) models to integrate such heterogeneous data. The FR model provide a tool for data exploration via dimensionality reduction and sparse low-rank covariance estimation while correcting for a range of covariate effects. MSFR are extensions of FR that enable us to jointly obtain a covariance structure that models the group-specific covariances in addition to the common component, learning covariate effects from the observed variables, such as the demographic information. I will discuss the use of several sparse priors (local and non-local) to learn the dimension of the latent factors. Our approach provides a flexible methodology for sparse factor regression which is not limited to data with covariate effects. I will present several examples, with a focus on bioinformatics applications. We show the usefulness of our methods in two main tasks: to give a visual representation of the latent factors of the data, i.e. an unsupervised dimension reduction task and (i) supervised survival analysis, using the factors obtained in our method as predictions for the cancer genomic data; and (ii) dietary pattern analysis, associating each factor with a measure of overall diet quality related to cardiometabolic disease risk for a hispanic community health nutritional-data study.

LINK TEAMS