Comparative Analysis of Machine Learning Models predicting the Anatomical Location (Proximal vs. Distal) of Colorectal Carcinomas using NGS and Clinical Data

  • Ines Neidhard

    Student thesis: Master's Thesis

    Abstract

    The anatomical location of colorectal carcinoma (CRC) influences therapy results as well as
    overall survival, underlining the importance of correctly classifying the tumours into proximal
    or distal areas. Thus, four Machine Learning algorithms (Decision Trees, k-Nearest Neighbors,
    Logistic Regression, and Support Vector Machines) were trained via nested cross-validation in
    order to classify colorectal carcinomas into right-sided or left-sided, by using NGS and clinical
    data provided by the Ordensklinikum Linz Barmherzige Schwestern, a hospital in Upper Austria.
    This data comprised more than 2770 CRC patients. However, a large data loss concerning
    the somatic mutations of multiple years of analyses was discovered when joining the data sets
    reduced the number of patients to just 112. The NGS data had to be filtered by six genes
    (KRAS, NRAS, BRAF, PIK3CA, TP53, and APC) in order to mitigate the risk of overfitting.
    For each ML algorithm, the model achieving the best test accuracy was determined. Each
    of the four models’ accuracy transcended the baseline of 64%. While the LR, SVM, and kNN
    models all achieved the same highest overall accuracy of 86.96%, both the highest balanced
    accuracy (84.17%) and AUC (0.84) were achieved by the Logistic Regression model.
    Date of Award2024
    Original languageEnglish (American)
    SupervisorGerald Lirk (Supervisor)

    Cite this

    '