Abstract
In this paper we describe the identification of variable interaction networks based on the analysis of medical data. The main goal is to generate mathematical models for medical parameters using other available parameters in this data set. For each variable we identify those features that are most relevant for modeling it; the relevance of a variable can in this context be defined via the frequency of its occurrence in models identified by evolutionary machine learning methods or via the decrease in modeling quality after removing it from the data set. Several data based modeling approaches implemented in HeuristicLab have been applied for identifying estimators for selected continuous as well as discrete medical variables and cancer diagnoses: Genetic programming, linear regression, k-nearest-neighbor regression, support vector machines (optimized using evolutionary algorithms), and random forests. In the empirical section of this paper we describe interaction networks identified for a medical data base storing data of more than 600 patients. We see that whatever modeling approach is used, it is possible to identify the most important influence factors and display those in interaction networks which can be interpreted without domain knowledge in machine learning or informatics in general.
Translated title of the contribution | Variable Interaction Networks in Medical Data |
---|---|
Original language | German |
Pages (from-to) | 265-270 |
Number of pages | 6 |
Journal | International Journal of Privacy and Health Information Management (IJPHIM) |
DOIs | |
Publication status | Published - 2013 |
Keywords
- Data mining
- Evolutionary algorithms
- Medical data analysis
- Variable interaction networks