Evolutionary Identification of Cancer Predictors Using Clustered Data

Research output: Chapter in Book/Report/Conference proceedingsConference contribution


In this paper we discuss the effects of using pre-clustered data on the identification of estimation models for cancer diagnoses. Based on patients’ data records including standard blood parameters, tumor markers, and information about the diagnosis of tumors, the goal is to identify mathematical models for estimating cancer diagnoses. We have applied a hybrid clustering and classification approach that first identifies data clusters (using standard patient data and tumor markers) and then learns prediction models on the basis of these data clusters. In the empirical section we analyze the clusters of patient data samples formed using k-means clustering: The optimal number of clusters is identified, and we investigate the homogeneity of these clusters. Several evolutionary modeling approaches implemented in HeuristicLab have been applied for subsequently identifying estimators for selected cancer diagnoses: Linear regression, k-nearest neighbor learning, artificial neural networks, and support vector machines (all optimized using evolutionary algorithms) as well as genetic programming. As we show in the results section, the investigated diagnoses of breast cancer, melanoma, and respiratory system cancer can be estimated correctly in up to 84.2%, 80.3%, and 94.1% of the analyzed test cases, respectively; without tumor markers up to 78.2%, 78%, and 93.3% of the test samples are correctly estimated, respectively.
Original languageEnglish
Title of host publicationCompanion Publication of the 2013 Genetic and Evolutionary Computation Conference, GECCO'13 Companion
PublisherACM Sigevo
Publication statusPublished - 2013
EventGenetic and Evolutionary Computation Conference - Amsterdam, Netherlands
Duration: 6 Jul 201310 Jul 2013


ConferenceGenetic and Evolutionary Computation Conference

Cite this