Extensions to Peptide Spectrum Match Validation using Machine Learning

Georg Pirklbauer, Stephan Winkler, Karl Mechtler, Viktoria Dorfer

Research output: Chapter in Book/Report/Conference proceedingsConference contribution


A substantial part of proteomics mass spectrometry experiments aim at the identification of peptide sequences. This is often achieved by sequence database searching, resulting in peptide-spectrum matches (PSMs). Validation of PSMs is a crucial topic in the community. Attributing confidence to PSMs allows for the retention of only statistically relevant identifications. Searching a target and a decoy database emerged as a practical way of estimating confidence and is universally accepted in the literature [1]. Based on the target-decoy approach, Käll et al. described a statistical framework that allows for the imputation of statistically sound confidence scores [2]. Based on this scoring they developed Percolator, an algorithm which allows boosting the number of confidently identified peptides at an arbitrary false positive rate cutoff [3]. The algorithm relies on a support vector machine and is widely accepted as a standard post-processing procedure in proteomics mass spectrometry experiments. Since the development of Percolator, alternatives to the support vector machine have been developed and brought to maturation. We believe that an increase in the number of PSMs can be achieved combining the ideas of the Percolator algorithm and new machine learning techniques. We utilised random forests [4] in an iterative approach similar to the Percolator algorithm. Compared to the standard target-decoy approach we were able to increase the number of confidently identified PSMs at 1% FDR by 18% on a standard HeLa sample.
Original languageEnglish
Title of host publicationProceedings of the German Conference on Bioinformatics
PublisherComputational Systems Biology
Publication statusPublished - 2018
EventGerman Conference on Bioinformatics - Vienna, Austria
Duration: 25 Sept 201828 Sept 2018


ConferenceGerman Conference on Bioinformatics
Internet address


Dive into the research topics of 'Extensions to Peptide Spectrum Match Validation using Machine Learning'. Together they form a unique fingerprint.

Cite this