Keyword Clustering in Biomedical Information Retrieval Using Evolutionary Algorithms

Translated title of the contribution: Keyword Clustering in Biomedical Information Retrieval Using Evolutionary Algorithms

Viktoria Dorfer, Stephan Winkler, Thomas Kern, Sophie Anna Blank, Gerald Petz, Patrizia Faschang

Research output: Chapter in Book/Report/Conference proceedingsConference contribution

Abstract

As the amount of available data in the field of life sciences grows exponentially, intelligent search strategies are necessary to help people in information retrieval. We here describe the use of a new keyword clustering method: Based on a set of documents (D), keyword clusters are optimized so that the identified groups of keywords consist of keywords that often occur in combination in D. The so generated keyword clusters shall in the near future serve as a solid base for a new PubMed search tool based on query extension, using also user feedback to optimize the search process. We have defined several important characteristics for clustering candidates, including the data set coverage, the cluster confidence (measuring the ratio of clustered keywords that are found in the same documents), and the document confidence (measuring the amount of equal keywords in the documents assigned to a cluster through their keywords). Evolutionary algorithms have been applied for solving this optimization task, amongst others evolution strategies (ES) and a multi-objective genetic algorithm (NSGA-II, used because the optimization objectives are partially contradictory). For testing this approach we have used data published for the TREC-9 conference containing 36,890 entries. Out of this data set we extracted the most significant keywords for clustering using tf-idf weighting. Analyzing first optimization results we see that the best result obtained with 10+1 ES provides 23.5% data set coverage, 45.2% cluster confidence, and 23.4% document confidence; using the NSGA-II we for example got results with respective values 71%, 56% and 37%.
Translated title of the contributionKeyword Clustering in Biomedical Information Retrieval Using Evolutionary Algorithms
Original languageGerman
Title of host publicationProceedings of the 19th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) and 10th European Conference on Computational Biology (ECCB)
PublisherInternational Society for Computational Biology
Publication statusPublished - 2011
Event19th Annual International Conference on Intelligent Systems for Molecular Biology and 10th European Conference on Computational Biology - Vienna, Austria
Duration: 17 Jul 201119 Jul 2011

Conference

Conference19th Annual International Conference on Intelligent Systems for Molecular Biology and 10th European Conference on Computational Biology
Country/TerritoryAustria
CityVienna
Period17.07.201119.07.2011

Cite this