TY - GEN
T1 - Using LLMs and Websearch in Order to Perform Fact Checking on Texts Generated by LLMs
AU - Sandler, Simone
AU - Krauss, Oliver
AU - Stöckl, Andreas
PY - 2025/4/25
Y1 - 2025/4/25
N2 - Finding out if a given text contains any false information is not an easy task. On large corpora of data, such as the tremendous amount of texts generated by LLMs, fact checking is a prohibitively expensive task. To address this challenge, we propose a novel approach that combines fact checking by LLMs with web search. This method can be applied not only to single sentences by applying a true, false or unknown label. For whole text paragraphs, a 0..1 score representing the truthfulness from the sentence labels is calculated. The process begins by extracting claims from the text, which is done using GPT3. Then, these claims are validated, with Google search results being used to supplement the GPT3 results. When validating our method against a corpus of 122 LLM-generated text samples, we achieve an accuracy of 0.79. To compare our work to other approaches, we also applied our fact checking to the FEVER dataset, achieving an accuracy of 0.78. Which is similar than the current best accuracy of 0.79 on the FEVER dataset. This demonstrates the potential of our proposed approach for automated fact checking.
AB - Finding out if a given text contains any false information is not an easy task. On large corpora of data, such as the tremendous amount of texts generated by LLMs, fact checking is a prohibitively expensive task. To address this challenge, we propose a novel approach that combines fact checking by LLMs with web search. This method can be applied not only to single sentences by applying a true, false or unknown label. For whole text paragraphs, a 0..1 score representing the truthfulness from the sentence labels is calculated. The process begins by extracting claims from the text, which is done using GPT3. Then, these claims are validated, with Google search results being used to supplement the GPT3 results. When validating our method against a corpus of 122 LLM-generated text samples, we achieve an accuracy of 0.79. To compare our work to other approaches, we also applied our fact checking to the FEVER dataset, achieving an accuracy of 0.78. Which is similar than the current best accuracy of 0.79 on the FEVER dataset. This demonstrates the potential of our proposed approach for automated fact checking.
UR - https://link.springer.com/chapter/10.1007/978-3-031-82957-4_27
M3 - Conference contribution
VL - 15173
BT - Lecture Notes in Computer Science
ER -