TY - JOUR
T1 - Defining objective clusters for rabies virus sequences using affinity propagation clustering
AU - Fischer, Susanne
AU - Freuling, Conrad M.
AU - Müller, Thomas
AU - Pfaff, Florian
AU - Bodenhofer, Ulrich
AU - Höper, Dirk
AU - Fischer, Mareike
AU - Marston, Denise A.
AU - Fooks, Anthony R.
AU - Mettenleiter, Thomas C.
AU - Conraths, Franz J.
AU - Homeier-Bachmann, Timo
N1 - Publisher Copyright:
© 2018 Fischer et al.
PY - 2018/1
Y1 - 2018/1
N2 - Rabies is caused by lyssaviruses, and is one of the oldest known zoonoses. In recent years, more than 21,000 nucleotide sequences of rabies viruses (RABV), from the prototype species rabies lyssavirus, have been deposited in public databases. Subsequent phylogenetic analyses in combination with metadata suggest geographic distributions of RABV. However, these analyses somewhat experience technical difficulties in defining verifiable criteria for cluster allocations in phylogenetic trees inviting for a more rational approach. Therefore, we applied a relatively new mathematical clustering algorythm named ‘affinity propagation clustering’ (AP) to propose a standardized sub-species classification utilizing full-genome RABV sequences. Because AP has the advantage that it is computationally fast and works for any meaningful measure of similarity between data samples, it has previously been applied successfully in bioinformatics, for analysis of microarray and gene expression data, however, cluster analysis of sequences is still in its infancy. Existing (516) and original (46) full genome RABV sequences were used to demonstrate the application of AP for RABV clustering. On a global scale, AP proposed four clusters, i.e. New World cluster, Arctic/Arctic-like, Cosmopolitan, and Asian as previously assigned by phylogenetic studies. By combining AP with established phylogenetic analyses, it is possible to resolve phylogenetic relationships between verifiably determined clusters and sequences. This workflow will be useful in confirming cluster distributions in a uniform transparent manner, not only for RABV, but also for other comparative sequence analyses.
AB - Rabies is caused by lyssaviruses, and is one of the oldest known zoonoses. In recent years, more than 21,000 nucleotide sequences of rabies viruses (RABV), from the prototype species rabies lyssavirus, have been deposited in public databases. Subsequent phylogenetic analyses in combination with metadata suggest geographic distributions of RABV. However, these analyses somewhat experience technical difficulties in defining verifiable criteria for cluster allocations in phylogenetic trees inviting for a more rational approach. Therefore, we applied a relatively new mathematical clustering algorythm named ‘affinity propagation clustering’ (AP) to propose a standardized sub-species classification utilizing full-genome RABV sequences. Because AP has the advantage that it is computationally fast and works for any meaningful measure of similarity between data samples, it has previously been applied successfully in bioinformatics, for analysis of microarray and gene expression data, however, cluster analysis of sequences is still in its infancy. Existing (516) and original (46) full genome RABV sequences were used to demonstrate the application of AP for RABV clustering. On a global scale, AP proposed four clusters, i.e. New World cluster, Arctic/Arctic-like, Cosmopolitan, and Asian as previously assigned by phylogenetic studies. By combining AP with established phylogenetic analyses, it is possible to resolve phylogenetic relationships between verifiably determined clusters and sequences. This workflow will be useful in confirming cluster distributions in a uniform transparent manner, not only for RABV, but also for other comparative sequence analyses.
KW - Algorithms
KW - Cluster Analysis
KW - Computational Biology/methods
KW - Phylogeny
KW - RNA, Viral/genetics
KW - Rabies virus/classification
KW - Sequence Homology, Nucleic Acid
UR - http://www.scopus.com/inward/record.url?scp=85041487506&partnerID=8YFLogxK
U2 - 10.1371/journal.pntd.0006182
DO - 10.1371/journal.pntd.0006182
M3 - Article
C2 - 29357361
AN - SCOPUS:85041487506
SN - 1935-2727
VL - 12
JO - PLoS Neglected Tropical Diseases
JF - PLoS Neglected Tropical Diseases
IS - 1
M1 - e0006182
ER -