Constrained Covariance Matrices With a Biologically Realistic Structure: Comparison of Methods for Generating High-Dimensional Gaussian Graphical Models

Frank Emmert-Streib, Shailesh Tripathi, Matthias Dehmer

Research output: Contribution to journalArticlepeer-review

6 Citations (Scopus)

Abstract

High-dimensional data from molecular biology possess an intricate correlation structure that is imposed by the molecular interactions between genes and their products forming various different types of gene networks. This fact is particularly well-known for gene expression data, because there is a sufficient number of large-scale data sets available that are amenable for a sensible statistical analysis confirming this assertion. The purpose of this paper is two fold. First, we investigate three methods for generating constrained covariance matrices with a biologically realistic structure. Such covariance matrices are playing a pivotal role in designing novel statistical methods for high-dimensional biological data, because they allow to define Gaussian graphical models (GGM) for the simulation of realistic data; including their correlation structure. We study local and global characteristics of these covariance matrices, and derived concentration/partial correlation matrices. Second, we connect these results, obtained from a probabilistic perspective, to statistical results of studies aiming to estimate gene regulatory networks from biological data. This connection allows to shed light on the well-known heterogeneity of statistical estimation methods for inferring gene regulatory networks and provides an explanation for the difficulties inferring molecular interactions between highly connected genes.

Original languageEnglish
Article number17
JournalFrontiers in Applied Mathematics and Statistics
Volume5
DOIs
Publication statusPublished - 12 Apr 2019

Keywords

  • data science
  • Gaussian graphical models
  • gene regulatory networks
  • genomics
  • machine learning
  • network science
  • statistics

Fingerprint

Dive into the research topics of 'Constrained Covariance Matrices With a Biologically Realistic Structure: Comparison of Methods for Generating High-Dimensional Gaussian Graphical Models'. Together they form a unique fingerprint.

Cite this