Improving language-dependent named entity detection

Gerald Petz, Werner Wetzlinger, Dietmar Nedbal

Research output: Chapter in Book/Report/Conference proceedingsConference contributionpeer-review

2 Citations (Scopus)

Abstract

Named Entity Recognition (NER) and Named Entity Linking (NEL) are two research areas that have shown big advancements in recent years. The majority of this research is based on the English language. Hence, some of these improvements are language-dependent and do not necessarily lead to better results when applied to other languages. Therefore, this paper discusses TOMO, an approach to language-aware named entity detection and evaluates it for the German language. This also required the development of a German gold standard dataset, which was based on the English dataset used by the OKE 2016 challenge. An evaluation of the named entity detection task using the web-based platform GERBIL was undertaken and results show that our approach produced higher F1 values than the other annotators did. This indicates that language-dependent features do improve the overall quality of the spotter.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Extraction - 1st IFIP TC 5, WG 8.4, 8.9, 12.9 International Cross-Domain Conference, CD-MAKE 2017, Proceedings
EditorsA. Min Tjoa, Andreas Holzinger, Kieseberg Peter Kieseberg, Edgar Weippl
PublisherSpringer
Pages330-345
Number of pages16
ISBN (Print)9783319668079
DOIs
Publication statusPublished - 2017
Event1st IFIP TC 5, WG 8.4, 8.9, 12.9 International Cross-Domain Conference on Machine Learning and Knowledge Extraction, CD-MAKE 2017 - Reggio, Italy
Duration: 29 Aug 20171 Sept 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10410 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference1st IFIP TC 5, WG 8.4, 8.9, 12.9 International Cross-Domain Conference on Machine Learning and Knowledge Extraction, CD-MAKE 2017
Country/TerritoryItaly
CityReggio
Period29.08.201701.09.2017

Keywords

  • Dataset development
  • Entity detection
  • Entity recognition
  • Gold standard
  • Language-dependent
  • NER

Fingerprint

Dive into the research topics of 'Improving language-dependent named entity detection'. Together they form a unique fingerprint.

Cite this