Addressing Sparsity in Metabolomics Data Analysis
Biography Overview Project Summary Comprehensive profiling of the small molecule repertoire in a sample is referred to as metabolomics, and is being used to address a variety of scientific questions in biomedical studies. Metabolomics offers more immediate measures of the physiology of an individual, and more direct examination of the effects of exposures such as nutrition, smoking and bacterial infections. For human health, metabolomics studies are being used to investigate disease mechanisms, discover biomarkers, diagnose disease, and monitor treatment responses. Metabolomics is increasingly recognized as an important component of precision medicine initiatives to complement and enhance collected genomic data. This is critical as the metabolome cannot be predicted from knowledge of the genome, transcriptome or proteome, but provides important information on the phenotype. Recent technological advances in mass spectrometry-based metabolomics have allowed for more comprehensive and sensitive measurements of metabolites. We focus on untargeted ultra-high pressure liquid chromatography coupled to mass spectrometry, which is one of the more commonly used methods. Despite the technological advances, the bottleneck for taking full advantage of metabolomics data is often the paucity and incompleteness of analytical tools and databases. Our goal is to develop novel statistical methods and software for the research community to improve the utilization of metabolomics data. There are many steps in a metabolomics data analysis pipeline, and we will focus on the downstream steps of normalization, and univariate, multivariate and pathway analyses. In particular, we will address the high levels of sparsity, which is one of the more unique aspects of metabolomics data compared to other ?omics data sets. For metabolomics data, there is sparsity in individual metabolites due to a large percentage of missing data for biological or technical reasons, and sparsity in connections between metabolites due to high collinearity and sparsely connected networks in metabolic pathways. The methods and software we develop will maximize the potential of metabolomics to provide new discoveries in disease etiology, diagnosis, and drug development.
Time
|