Abstract
This master thesis investigates whether recurrent neural networks are suitable for classifyingcoil proteins into dimers and trimers. Coiled coil proteins are of great interest due to their
special structure and their diverse medical applications, especially in drug research. The thesis
combines theoretical aspects of coiled coil proteins and currently available tools of coiled coil
protein sequence analysis with an experimental model of a recurrent neural network to answer
the research question.
In the practical part a bidirectional recurrent neural model named CoRNN was developed.
Despite challenges with overfitting and an unbalanced dataset CoRNN could be optimized by
introducing a fold group dataset. CoRNN proofed itself as especially adept with excellent
ROC-AUC-, precision- and recall-scores during training. Two validation sets were then used
to compare CoRNN with the existing tools PrOCoil and CoCoPRED for coiled coil protein
classification and to check whether the classification could be performed successfully. While
CoRNN and PrOCoil successfully classified all protein sequences of both validation sets, CoCoPRED failed due to not being able to work with shorter sequences and incorrect classifications. The results show that CoRNN is a suitable model for classifying coiled coil proteins into
dimers and trimers and can compete with the currently available tools.
Date of Award | 2024 |
---|---|
Original language | German (Austria) |
Supervisor | Ulrich Bodenhofer (Supervisor) |