PhoStar: Identifying Tandem Mass Spectra of Phosphorylated Peptides before Database Search

Research output: Contribution to journalArticlepeer-review

8 Citations (Scopus)

Abstract

Standard proteomics workflows use tandem mass spectrometry followed by sequence database search to analyze complex biological samples. The identification of proteins carrying post-translational modifications, for example, phosphorylation, is typically addressed by allowing variable modifications in the searched sequences. Accounting for these variations exponentially increases the combinatorial space in the database, which leads to increased processing times and more false positive identifications. The here-presented tool PhoStar identifies spectra that originate from phosphorylated peptides before database search using a supervised machine learning approach. The model for the prediction of phosphorylation was trained and validated with an accuracy of 97.6% on a large set of high-confidence spectra collected from publicly available experimental data. Its power was further validated by predicting phosphorylation in the complete NIST human and mouse high collision-dissociation spectral libraries, achieving an accuracy of 98.2 and 97.9%, respectively. We demonstrate the application of PhoStar by using it for spectra filtering before database search. In database search of HeLa samples the peptide search space was reduced by 27–66% while finding at least 97% of total peptide identifications (at 1% FDR) compared with a standard workflow.
Original languageEnglish
Pages (from-to)290-295
Number of pages6
JournalJournal of Proteome Research
Volume17
Issue number1
DOIs
Publication statusPublished - 5 Jan 2018

Keywords

  • machine learning
  • mass spectrometry
  • phosphorylation
  • post-translational modification
  • proteomics
  • random forest classification
  • search space reduction

Fingerprint

Dive into the research topics of 'PhoStar: Identifying Tandem Mass Spectra of Phosphorylated Peptides before Database Search'. Together they form a unique fingerprint.

Cite this