TY - GEN
T1 - Evolutionary identification of cancer predictors using clustered data: a case study for breast cancer, melanoma, and cancer in the respiratory system.
AU - Winkler, Stephan M.
AU - Affenzeller, Michael
AU - Stekel, Herbert
PY - 2013
Y1 - 2013
N2 - In this paper we discuss the effects of using pre-clustered data on the identification of estimation models for cancer diagnoses. Based on patients' data records including standard blood parameters, tumor markers, and information about the diagnosis of tumors, the goal is to identify mathematical models for estimating cancer diagnoses. We have applied a hybrid clustering and classification approach that first identifies data clusters (using standard patient data and tumor markers) and then learns prediction models on the basis of these data clusters. In the empirical section we analyze the clusters of patient data samples formed using k-means clustering: The optimal number of clusters is identified, and we investigate the homogeneity of these clusters. Several evolutionary modeling approaches implemented in HeuristicLab have been applied for subsequently identifying estimators for selected cancer diagnoses: Linear regression, k-nearest neighbor learning, artificial neural networks, and support vector machines (all optimized using evolutionary algorithms) as well as genetic programming. As we show in the results section, the investigated diagnoses of breast cancer, melanoma, and respiratory system cancer can be estimated correctly in up to 84.2%, 80.3%, and 94.1% of the analyzed test cases, respectively; without tumor markers up to 78.2%, 78%, and 93.3% of the test samples are correctly estimated, respectively.
AB - In this paper we discuss the effects of using pre-clustered data on the identification of estimation models for cancer diagnoses. Based on patients' data records including standard blood parameters, tumor markers, and information about the diagnosis of tumors, the goal is to identify mathematical models for estimating cancer diagnoses. We have applied a hybrid clustering and classification approach that first identifies data clusters (using standard patient data and tumor markers) and then learns prediction models on the basis of these data clusters. In the empirical section we analyze the clusters of patient data samples formed using k-means clustering: The optimal number of clusters is identified, and we investigate the homogeneity of these clusters. Several evolutionary modeling approaches implemented in HeuristicLab have been applied for subsequently identifying estimators for selected cancer diagnoses: Linear regression, k-nearest neighbor learning, artificial neural networks, and support vector machines (all optimized using evolutionary algorithms) as well as genetic programming. As we show in the results section, the investigated diagnoses of breast cancer, melanoma, and respiratory system cancer can be estimated correctly in up to 84.2%, 80.3%, and 94.1% of the analyzed test cases, respectively; without tumor markers up to 78.2%, 78%, and 93.3% of the test samples are correctly estimated, respectively.
KW - Cancer diagnosis estimation
KW - Clustering
KW - Data mining
KW - Machine learning
KW - Statistical analysis
KW - Tumor marker data
UR - http://www.scopus.com/inward/record.url?scp=84882337582&partnerID=8YFLogxK
U2 - 10.1145/2464576.2466809
DO - 10.1145/2464576.2466809
M3 - Conference contribution
AN - SCOPUS:84882337582
SN - 9781450319645
T3 - GECCO 2013 - Proceedings of the 2013 Genetic and Evolutionary Computation Conference Companion
SP - 1463
EP - 1470
BT - GECCO 2013 - Proceedings of the 2013 Genetic and Evolutionary Computation Conference Companion
T2 - 15th Annual Conference on Genetic and Evolutionary Computation, GECCO 2013
Y2 - 6 July 2013 through 10 July 2013
ER -