TY - JOUR
T1 - Harnessing the complexity of gene expression data from cancer
T2 - From single gene to structural pathway methods
AU - Emmert-Streib, Frank
AU - Tripathi, Shailesh
AU - Matos Simoes, Ricardo D.
N1 - Funding Information:
ST is supported by a studentship from the National Institute of Immunology. FES and RDMS are supported by the Engineering and Physical Sciences Research Council (EPSRC) and DEL.
PY - 2012/12/10
Y1 - 2012/12/10
N2 - High-dimensional gene expression data provide a rich source of information because they capture the expression level of genes in dynamic states that reflect the biological functioning of a cell. For this reason, such data are suitable to reveal systems related properties inside a cell, e.g., in order to elucidate molecular mechanisms of complex diseases like breast or prostate cancer. However, this is not only strongly dependent on the sample size and the correlation structure of a data set, but also on the statistical hypotheses tested. Many different approaches have been developed over the years to analyze gene expression data to (I) identify changes in single genes, (II) identify changes in gene sets or pathways, and (III) identify changes in the correlation structure in pathways. In this paper, we review statistical methods for all three types of approaches, including subtypes, in the context of cancer data and provide links to software implementations and tools and address also the general problem of multiple hypotheses testing. Further, we provide recommendations for the selection of such analysis methods.
AB - High-dimensional gene expression data provide a rich source of information because they capture the expression level of genes in dynamic states that reflect the biological functioning of a cell. For this reason, such data are suitable to reveal systems related properties inside a cell, e.g., in order to elucidate molecular mechanisms of complex diseases like breast or prostate cancer. However, this is not only strongly dependent on the sample size and the correlation structure of a data set, but also on the statistical hypotheses tested. Many different approaches have been developed over the years to analyze gene expression data to (I) identify changes in single genes, (II) identify changes in gene sets or pathways, and (III) identify changes in the correlation structure in pathways. In this paper, we review statistical methods for all three types of approaches, including subtypes, in the context of cancer data and provide links to software implementations and tools and address also the general problem of multiple hypotheses testing. Further, we provide recommendations for the selection of such analysis methods.
KW - Cancer data
KW - Cancer genomics
KW - Correlation structure
KW - Gene expression data
KW - Pathway methods
KW - Statistical analysis methods
KW - Gene Expression Profiling/methods
KW - Data Interpretation, Statistical
KW - Humans
KW - Gene Expression Regulation, Neoplastic
KW - Neoplasms/genetics
KW - Oligonucleotide Array Sequence Analysis/methods
KW - Sample Size
KW - Software
UR - http://www.scopus.com/inward/record.url?scp=84872650417&partnerID=8YFLogxK
U2 - 10.1186/1745-6150-7-44
DO - 10.1186/1745-6150-7-44
M3 - Review article
C2 - 23227854
AN - SCOPUS:84872650417
SN - 1745-6150
VL - 7
SP - 44
JO - Biology Direct
JF - Biology Direct
M1 - 44
ER -