TY - GEN
T1 - Classification of Tumor Marker Values Using Heuristic Data Mining Methods
AU - Winkler, Stephan
AU - Affenzeller, Michael
AU - Jacak, Witold
AU - Stekel, Herbert
PY - 2010
Y1 - 2010
N2 - Tumor markers are substances that are found in blood, urine, or body tissues and that are used as indicators for tumors; elevated tumor marker values can indicate the presence of cancer, but there can also be other causes. We have used a medical database compiled at the blood laboratory of the General Hospital Linz, Austria: Several blood values of thousands of patients are available as well as several tumor markers. We have used several data based modeling approaches for identifying mathematical models for estimating selected tumor marker values on the basis of routinely available blood values; in detail, estimators for the tumor markers AFP, CA-125, CA15-3, CEA, CYFRA, and PSA have been identified and are analyzed in this paper. The documented tumor marker values are classified as "normal" or "elevated"; our goal is to design classifiers for the respective binary classification problems. As we show in the results section, for those medical modeling tasks described here, genetic programming performs best among those techniques that are able to identify nonlinearities; we also see that GP results show less overfitting than those produced using other methods.
AB - Tumor markers are substances that are found in blood, urine, or body tissues and that are used as indicators for tumors; elevated tumor marker values can indicate the presence of cancer, but there can also be other causes. We have used a medical database compiled at the blood laboratory of the General Hospital Linz, Austria: Several blood values of thousands of patients are available as well as several tumor markers. We have used several data based modeling approaches for identifying mathematical models for estimating selected tumor marker values on the basis of routinely available blood values; in detail, estimators for the tumor markers AFP, CA-125, CA15-3, CEA, CYFRA, and PSA have been identified and are analyzed in this paper. The documented tumor marker values are classified as "normal" or "elevated"; our goal is to design classifiers for the respective binary classification problems. As we show in the results section, for those medical modeling tasks described here, genetic programming performs best among those techniques that are able to identify nonlinearities; we also see that GP results show less overfitting than those produced using other methods.
KW - Classification
KW - Data mining
KW - Machine learning
KW - Statistical analysis
KW - Tumor marker data
UR - http://www.scopus.com/inward/record.url?scp=77955932153&partnerID=8YFLogxK
U2 - 10.1145/1830761.1830826
DO - 10.1145/1830761.1830826
M3 - Conference contribution
SN - 9781450300735
T3 - Proceedings of the 12th Annual Genetic and Evolutionary Computation Conference, GECCO '10 - Companion Publication
SP - 1915
EP - 1922
BT - Proceedings of the 12th Annual Genetic and Evolutionary Computation Conference, GECCO '10 - Companion Publication
PB - ACM Sigevo
T2 - Proceedings of the Genetic and Evolutionary Computation Conference GECCO 2010
Y2 - 7 July 2010 through 11 July 2010
ER -