The study of Protein-Protein Interactions (PPI) promises to reveal fundamental molecular mechanisms of cell functions and many diseases. The last decade has seen a tremendous increase of known PPIs, with hundreds of thousands of them now being available in public databases. However, it is estimated that about 50% of reported PPIs are actually false-positives, i. e. experimental artifacts without biological significance. Reliable verification of PPIs is therefore indispensable and currently an important topic in bioinformatics. Motivated by recent insights into PPI evolution, this work proposes a new homology-based approach to PPI validation. The underlying idea is that most PPIs originate from genetic duplications and have not evolved de novo between previously non-interacting proteins. Such an evolutionary relationship between PPIs implies that formost true-positive PPIs a lot of homologous PPIs exist, within the same species and within all other. On top of this assumption, a statistical hypothesis test is formulated and applied on a large, integrated data set of known PPIs. Under the null hypothesis, i. e. the hypothesis that a given PPI is a false-positive, the number of PPIs among homologous proteins is expected to correspond to the number of PPIs among randomly chosen proteins. If the former number is increased, the null hypothesis is rejected. A P-value test statistic, the Interaction P-Value (IPV), detects statistically significant results. The classification performance of the IPV is assessed on three gold standard data sets, and is compared to two existing homology-based classifiers. At a level of specificity of 80%, achieved levels of sensitivity range from 76% to 84%. The statistical analysis of homologous PPIs presented here suggests that homology-based PPI validation on large, integrated PPI data sets has great potential.
|Publication status||Published - 2007|
- protein-protein interaction