Abstract
Large Language Models (LLMs) have gained worldwide recognition with the introduction of the ChatGPT chatbot. This paper explores how this technology can be utilizedin the medical field for the automated generation of medical reports. Writing medical
reports after hospital treatments is time consuming for medical staff. Automating this
process can save time and can reduce the workload of doctors. The goal of this work is
to create a medical report in high quality with LLMs based on provided clinical findings.
In addition to existing real world data, synthetic data for clinical findings and medical
reports is generated. A method is developed to evaluate the quality of the generated
medical reports based on the criteria of structure, accuracy, and completeness.
Experiments using OpenAI’s GPT-4o model compare differently structured few-shot
prompts. In few-shot learning the model is provided with examples of clinical findings
and corresponding medical reports, allowing it to learn the underlying connections between input and output, as well as the structure of the reports. In the experiments
prompts with varying numbers of examples are tested. Additionally GPTs that are
adapted to the context of medical report generation are explored. Findings show that
prompts that are created using prompt engineering techniques lead to good results. Custom GPTs perform worse than comparable prompts. The few-shot learning technique
enables good results with only a few examples. Often, a single example is sufficient for
the model to recognize the desired structure of medical reports. The best results can
be achieved with a detailed prompt that also describes the context, assigns a role to
the model and includes three examples. Since misinformation in the responses of an
LLM cannot be entirely ruled out, the generated medical reports must be reviewed by
medical professionals in practice.
Date of Award | 2024 |
---|---|
Original language | German (Austria) |
Supervisor | Stephan Dreiseitl (Supervisor) |