Disambiguating Unknown Entities in News Articles

  • Jakob Sebastian Lammel

Student thesis: Master's Thesis

Abstract

Entity Linking (EL) is a well-known technique that connects mentions of entities in
documents to a knowledge graph. However, there are numerous mentions without corresponding entries, which are often overlooked in many natural language processing
applications despite containing valuable information. This issue is particularly prevalent in the news domain, where many mentions lack entries in knowledge graphs due to
the entities not being considered important enough or just emerging.
Existing approaches tend to address both linked and unlinked mentions simultaneously, which prevents the full utilization of modern EL techniques. This thesis presents
a novel method that focuses exclusively on disambiguating unlinked mentions in news
articles.
The proposed approach employs an agglomerative clustering algorithm to group
these mentions. The necessary similarities for clustering are derived from three factors:
contextual semantic similarity, similarity of entities present in the document, as well as
textual similarity of the mention.
To ensure practical applicability, this thesis further explores and evaluates the use
of modern database systems, specifically vector and graph databases. The approach
is tested on a dataset consisting of one million news articles from both German and
English news outlets.
Date of Award2024
Original languageEnglish (American)
SupervisorErik Pitzer (Supervisor)

Cite this

'