Unterstützung des Modellierungsprozesses von Bayes’schen Netzwerken durch Large Language Models

  • Michael Trenker

    Student thesis: Master's Thesis

    Abstract

    This master’s thesis investigates the creation of node probability tables for Bayesian
    networks in the context of risk management and evaluates the extent to which Large
    Language Models (LLMs) can replace traditional expert interviews.
    To this end, the theoretical foundations of probability, causality, Bayes’ theorem,
    Bayesian statistics, and various methods for generating node probability tables are first
    presented.
    Since the quality of such tables strongly depends on subjective assessments and
    classical metrics are only of limited applicability, two hypotheses were formulated and
    empirically tested:
    The first hypothesis states that LLMs can reliably identify whether an event B
    influences the probability of another event A (positively, negatively, or not at all). The
    second hypothesis posits that LLMs provide consistent probability estimates when the
    context remains unchanged.
    Finally, a direct comparison was conducted between node probability tables created
    by LLMs and those generated by human experts. The underlying Bayesian network addresses a MITRE ATT&CK technique and various detection capabilities. The surveyed
    experts had differing levels of experience in order to evaluate which degree of expertise most closely aligns with the tables generated by LLMs.
    The results show that most Large Language Models are capable of correctly assessing event dependencies and their impact on occurrence probabilities with up to 78%
    accuracy. However, no correlation was found between the correctness of classification
    and the model’s self-assessed confidence in its responses. It was also observed that the
    chosen prompting strategy influences the success of probability estimation.
    Similarly, LLMs demonstrated a high degree of consistency in their estimated probabilities when questioned repeatedly. Unsurprisingly, a lower temperature setting positively affected response consistency.
    In the direct comparison of node probability tables, it was found that human experts
    tend to make more conservative estimates. In contrast, Large Language Models more
    frequently overestimate probabilities compared to human experts. The tables generated
    by LLMs most closely resembled those created by an expert with theoretical general
    knowledge in information security and initial practical experience in cyber defense.
    Date of Award2025
    Original languageGerman (Austria)
    SupervisorEckehard Hermann (Supervisor)

    Studyprogram

    • Secure Information Systems

    Cite this

    '