Extending MS Amanda to Process DIA Data and Apply Rescoring with Machine Learning

  • Severin Johannes Pichler

Student thesis: Master's Thesis

Abstract

Precise peptide identification is essential in mass spectrometry-based proteomics, providing critical insights into biological processes and supporting applications like early
disease diagnosis and drug development. This work aimed to improve the capabilities
of MS Amanda, a database search tool for data-dependent acquisition (DDA) data, to
also identify peptides in data-independent acquisition (DIA) data effectively. Additionally, to further enhance peptide identification, rescoring was performed using different
machine learning techniques.
MS Amanda was modified to identify multiple peptides per spectrum by using an iterative analysis process. In each iteration, matched ions from the previous round were
removed, allowing different peptides to be identified in subsequent iterations. Furthermore, the MS2PIP, DeepLC, and Mokapot models of MS2Rescore were retrained on DIA
data to optimize their performance. A variety of machine learning algorithms, including CatBoost, SVM, and neural networks, were explored to identify the most effective
combination for peptide identification.
While MS Amanda demonstrated the ability in connecting peptides to their corresponding spectra, its overall peptide identification performance lagged behind state-ofthe-art tools like Dia-NN and Spectronaut. Even after rescoring the peptide spectrum
matchs (PSMs) with both default and specialized models, the number of unique peptide
identifications remained lower.
Date of Award2024
Original languageEnglish (American)
SupervisorViktoria Dorfer (Supervisor)

Cite this

'