Optimierung von Programmcode durch Einsatz eines Large Language Models

  • Nino Andre Rinnerberger

    Student thesis: Master's Thesis

    Abstract

    The quality of source code is a critical factor in software development, significantly influencing maintainability, security, and efficiency. Despite modern tools such as GitHub
    Copilot or ChatGPT, structural issues - such as anti-patterns and code smells - continue to occur regularly and often go unnoticed. Since source code is often considered
    intellectual property and protected by copyright, this work focuses on analyzing and optimizing such issues through the development of a prototype that uses locally executed
    Large Language Models (LLMs) to analyze Python code. Python is examined in this
    context because existing research predominantly focuses on object-oriented languages
    such as Java. A systematic analysis of Python code in the context of LLM-based pattern
    recognition remains a rarity.
    The developed prototype identifies problematic code patterns in Python and generates concrete suggestions for improvement. To integrate expert knowledge, In-Context
    Learning is employed - a lightweight, flexibly extendable method that does not require
    training or fine-tuning of existing models and allows knowledge about anti-patterns
    and code smells to be conveyed directly via prompts. Particular emphasis is placed
    on protecting sensitive data and ensuring independence from cloud-based services: the
    entire process is based on open-source components and runs locally on consumer hardware. The evaluation was conducted using a custom dataset, initially generated with
    ChatGPT-4o and subsequently reviewed and refined manually. This dataset served to
    compare local models such as Qwen2.5, IBM Granite, and Codestral with cloud-based
    models such as ChatGPT-4o and DeepSeek v3.
    The results show that models like ChatGPT-4o and Qwen2.5 achieve high detection accuracy while maintaining acceptable response times. Notably, the comparable
    performance of Qwen2.5 to the cloud-based model ChatGPT-4o highlights that locally
    executable LLMs represent an effective and privacy-friendly alternative to commercial
    AI-based code analysis tools.
    Date of Award2025
    Original languageGerman (Austria)
    SupervisorHarald Lampesberger (Supervisor)

    Studyprogram

    • Secure Information Systems

    Cite this

    '