Project Details


To identify proteins in biological samples mass spectrometry (MS) is most often applied: proteins are digested to peptides which are subsequently analyzed. Within the last decade, a new generation of mass spectrometers has been developed that are capable of acquiring mass spectra with high resolution and high mass accuracy. This has significantly changed the characteristics of mass spectra; however, this development has not been accompanied by a corresponding progress in peptide identification algorithms capable of fully exploiting the available information.
We therefore propose to develop a set of novel identification algorithms that are specifically designed for the analysis of modern mass spectra and incorporate multiple sources of information in the here proposed bioinformatics research project.
Preliminary research results are promising: The project consortium consisting of the Proteomics Group at IMP Vienna and the Bioinformatics Research Group at FH OƖ (Campus Hagenberg) has already conducted successful joint research in the analysis of MS data: Identification rates comparable or even superior to Mascot, the current gold-standard, have been achieved using a first version of a scoring function designed by the proposing consortium.
Encouraged by these preliminary research results, we are convinced that considering additional sources of information will further improve identification rates of mass spectra ā€“ therefore this project is dedicated to research on a combination of the following novel approaches: We plan to use machine learning techniques to analyze peptide elution times, fragmentation patterns and mass accuracy characteristics specific to the instrument; in addition, observed m/z values will be recalibrated based on the mass error of highly reliable identifications, and the remaining mass error with regard to the learned distribution will be incorporated into the scoring function. Sophisticated peak picking strategies will also be designed using machine learning. These improvements will help increase identification rates in challenging situations such as hybrid spectra and exhaustive searches for a wide range of post-translational modifications. The latter approach leads to exponentially growing search spaces and an accompanying drop in spectra identification rates because the information in MS spectra on its own is not sufficient to cope with the increased search space. Instead of applying brute force methods we plan to solve this problem using construction heuristics, i.e., evolutionary algorithms that realize intelligent search strategies for large numbers of unknown post-translational modifications based on a combination of database search and de novo identification.
All research results achieved in this project will be published and made freely available to the bioinformatics and proteomics communities. Improving identification rates of peptides in general and of unknown modifications in particular will permit a deeper insight into the proteome; computer science shall thus form a new basis for finding answers to important medical and biological questions.
Short titleSESAM
Effective start/end date01.03.2013 ā†’ 29.02.2016