TY - JOUR
T1 - Constrained Covariance Matrices With a Biologically Realistic Structure
T2 - Comparison of Methods for Generating High-Dimensional Gaussian Graphical Models
AU - Emmert-Streib, Frank
AU - Tripathi, Shailesh
AU - Dehmer, Matthias
N1 - Funding Information:
We would like to thank Robert Castelo and Ricardo de Matos Simoes for fruitful discussions. Funding. MD thanks the Austrian Science Funds for supporting this work (project P 30031).
Funding Information:
MD thanks the Austrian Science Funds for supporting this work (project P 30031).
Publisher Copyright:
© Copyright © 2019 Emmert-Streib, Tripathi and Dehmer.
PY - 2019/4/12
Y1 - 2019/4/12
N2 - High-dimensional data from molecular biology possess an intricate correlation structure that is imposed by the molecular interactions between genes and their products forming various different types of gene networks. This fact is particularly well-known for gene expression data, because there is a sufficient number of large-scale data sets available that are amenable for a sensible statistical analysis confirming this assertion. The purpose of this paper is two fold. First, we investigate three methods for generating constrained covariance matrices with a biologically realistic structure. Such covariance matrices are playing a pivotal role in designing novel statistical methods for high-dimensional biological data, because they allow to define Gaussian graphical models (GGM) for the simulation of realistic data; including their correlation structure. We study local and global characteristics of these covariance matrices, and derived concentration/partial correlation matrices. Second, we connect these results, obtained from a probabilistic perspective, to statistical results of studies aiming to estimate gene regulatory networks from biological data. This connection allows to shed light on the well-known heterogeneity of statistical estimation methods for inferring gene regulatory networks and provides an explanation for the difficulties inferring molecular interactions between highly connected genes.
AB - High-dimensional data from molecular biology possess an intricate correlation structure that is imposed by the molecular interactions between genes and their products forming various different types of gene networks. This fact is particularly well-known for gene expression data, because there is a sufficient number of large-scale data sets available that are amenable for a sensible statistical analysis confirming this assertion. The purpose of this paper is two fold. First, we investigate three methods for generating constrained covariance matrices with a biologically realistic structure. Such covariance matrices are playing a pivotal role in designing novel statistical methods for high-dimensional biological data, because they allow to define Gaussian graphical models (GGM) for the simulation of realistic data; including their correlation structure. We study local and global characteristics of these covariance matrices, and derived concentration/partial correlation matrices. Second, we connect these results, obtained from a probabilistic perspective, to statistical results of studies aiming to estimate gene regulatory networks from biological data. This connection allows to shed light on the well-known heterogeneity of statistical estimation methods for inferring gene regulatory networks and provides an explanation for the difficulties inferring molecular interactions between highly connected genes.
KW - data science
KW - Gaussian graphical models
KW - gene regulatory networks
KW - genomics
KW - machine learning
KW - network science
KW - statistics
UR - http://www.scopus.com/inward/record.url?scp=85075359387&partnerID=8YFLogxK
U2 - 10.3389/fams.2019.00017
DO - 10.3389/fams.2019.00017
M3 - Article
AN - SCOPUS:85075359387
SN - 2297-4687
VL - 5
JO - Frontiers in Applied Mathematics and Statistics
JF - Frontiers in Applied Mathematics and Statistics
M1 - 17
ER -