Abstract
In recent years, advancements in DNA, RNA, and protein analysis technologies have led to improved quantification and decoding of nucleotide and amino acid sequences, respectively. Especially, high-throughput technologies in the fields of genomics, transcriptomics, and proteomics have facilitated the generation of large amounts of data and opening vast opportunities in biological and medical research. Accordingly, the requirement for performant and accurate bioinformatics algorithms and pipelines has become increasingly important. In her research, the author of this thesis focused on the development of efficient algorithms and frameworks for the analysis of disease-associated genomics data derived from highthroughput technologies such as next-generation sequencing (NGS) and microarrays. The results are two frameworks: one for large-scale UMI-tagged immune repertoire NGS data analysis and one for small-scale UMI-tagged NGS data, as well as a workflow for gene expression data analysis. The development of these three efficient and robust data processing pipelines has been accomplished within this thesis and aims to ensure reproducibility and transparency. In particular, the ImmunoDataAnalyzer (IMDA) allows for automated pre-processing and processing of immune repertoire NGS data; the Interface for Point Mutation Identification (IMPI) aids in identifying low-frequency point mutations, and the gene expression analysis workflow enables automated investigation of publicly available gene expression datasets from NCBI’s Gene Expression Omnibus (GEO). Each framework was implemented to answer different biological research questions but was designed in a versatile manner to enable their use by a broader community. Further, state of the art thirdparty tools have been integrated to ensure accurate data evaluation and are well-accepted within the community. An additional emphasis was placed on integrating machine learning (ML) algorithms or providing output suitable for ML frameworks, respectively. The first part of this thesis outlines the context and the objectives of the studies, followed by a comprehensive review of the articles published by the author that are provided in the appendices. The thesis is concluded with a summary and discussion of the findings, and an outlook for future research is provided.
Original language | English (American) |
---|---|
Qualification | Dr. rer. nat. |
Awarding Institution |
|
Supervisors/Advisors |
|
Award date | 22 May 2024 |
Publication status | Published - 22 May 2024 |