TY - GEN
T1 - Feature selection for unsupervised learning via comparison of distance matrices
AU - Dreiseitl, Stephan
N1 - Copyright:
Copyright 2014 Elsevier B.V., All rights reserved.
PY - 2013
Y1 - 2013
N2 - Feature selection for unsupervised learning is generally harder than for supervised learning, because the former lacks the class information of the latter, and thus an obvious way by which to measure the quality of a feature subset. In this paper, we propose a new method based on representing data sets by their distance matrices, and judging feature combinations by how well the distance matrix using only these features resembles the distance matrix of the full data set. Using articial data for which the relevant features were known, we observed that the results depend on the data dimensionality, the fraction of relevant features, the overlap between clusters in the relevant feature subspaces, and how to measure the similarity of distance matrices. Our method consistently achieved higher than 80% detection rates of relevant features for a wide variety of experimental configurations.
AB - Feature selection for unsupervised learning is generally harder than for supervised learning, because the former lacks the class information of the latter, and thus an obvious way by which to measure the quality of a feature subset. In this paper, we propose a new method based on representing data sets by their distance matrices, and judging feature combinations by how well the distance matrix using only these features resembles the distance matrix of the full data set. Using articial data for which the relevant features were known, we observed that the results depend on the data dimensionality, the fraction of relevant features, the overlap between clusters in the relevant feature subspaces, and how to measure the similarity of distance matrices. Our method consistently achieved higher than 80% detection rates of relevant features for a wide variety of experimental configurations.
KW - dimensionality reduction
KW - distance matrix similarity
KW - feature extraction
KW - Unsupervised feature selection
UR - http://www.scopus.com/inward/record.url?scp=84892615135&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-53856-8_26
DO - 10.1007/978-3-642-53856-8_26
M3 - Conference contribution
SN - 9783642538551
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 203
EP - 210
BT - Computer Aided Systems Theory, EUROCAST 2013 - 14th International Conference, Revised Selected Papers
PB - Springer
T2 - 14th International Conference on Computer Aided Systems Theory, Eurocast 2013
Y2 - 10 February 2013 through 15 February 2013
ER -