Abstract
This master’s thesis investigates the creation of node probability tables for Bayesiannetworks in the context of risk management and evaluates the extent to which Large
Language Models (LLMs) can replace traditional expert interviews.
To this end, the theoretical foundations of probability, causality, Bayes’ theorem,
Bayesian statistics, and various methods for generating node probability tables are first
presented.
Since the quality of such tables strongly depends on subjective assessments and
classical metrics are only of limited applicability, two hypotheses were formulated and
empirically tested:
The first hypothesis states that LLMs can reliably identify whether an event B
influences the probability of another event A (positively, negatively, or not at all). The
second hypothesis posits that LLMs provide consistent probability estimates when the
context remains unchanged.
Finally, a direct comparison was conducted between node probability tables created
by LLMs and those generated by human experts. The underlying Bayesian network addresses a MITRE ATT&CK technique and various detection capabilities. The surveyed
experts had differing levels of experience in order to evaluate which degree of expertise most closely aligns with the tables generated by LLMs.
The results show that most Large Language Models are capable of correctly assessing event dependencies and their impact on occurrence probabilities with up to 78%
accuracy. However, no correlation was found between the correctness of classification
and the model’s self-assessed confidence in its responses. It was also observed that the
chosen prompting strategy influences the success of probability estimation.
Similarly, LLMs demonstrated a high degree of consistency in their estimated probabilities when questioned repeatedly. Unsurprisingly, a lower temperature setting positively affected response consistency.
In the direct comparison of node probability tables, it was found that human experts
tend to make more conservative estimates. In contrast, Large Language Models more
frequently overestimate probabilities compared to human experts. The tables generated
by LLMs most closely resembled those created by an expert with theoretical general
knowledge in information security and initial practical experience in cyber defense.
| Date of Award | 2025 |
|---|---|
| Original language | German (Austria) |
| Supervisor | Eckehard Hermann (Supervisor) |
Studyprogram
- Secure Information Systems