TY - JOUR
T1 - cFinder: definition and quantification of multiple haplotypes in a mixed sample
AU - Niklas, Norbert
AU - Hafenscher, Julia
AU - Barna, Agnes
AU - Wiesinger, Karin
AU - Pröll, Johannes
AU - Dreiseitl, Stephan
AU - Preuner-Stix, Sandra
AU - Valent, Peter
AU - Lion, Thomas
AU - Gabriel, Christian
N1 - Funding Information:
This work was supported by the Austrian Science Fund (FWF), SFB Grants F4704‑B20 and F4705‑B20.
Publisher Copyright:
© 2015 Niklas et al.
PY - 2015/9/7
Y1 - 2015/9/7
N2 - Background: Next-generation sequencing allows for determining the genetic composition of a mixed sample. For instance, when performing resistance testing for BCR-ABL1 it is necessary to identify clones and define compound mutations; together with an exact quantification this may complement diagnosis and therapy decisions with additional information. Moreover, that applies not only to oncological issues but also determination of viral, bacterial or fungal infection. The efforts to retrieve multiple haplotypes (more than two) and proportion information from data with conventional software are difficult, cumbersome and demand multiple manual steps. Results: Therefore, we developed a tool called cFinder that is capable of automatic detection of haplotypes and their accurate quantification within one sample. BCR-ABL1 samples containing multiple clones were used for testing and our cFinder could identify all previously found clones together with their abundance and even refine some results. Additionally, reads were simulated using GemSIM with multiple haplotypes, the detection was very close to linear (R2 = 0.96). Our aim is not to deduce haploblocks over statistics, but to characterize one sample's composition precisely. As a result the cFinder reports the connections of variants (haplotypes) with their readcount and relative occurrence (percentage). Download is available at http://sourceforge.net/projects/cfinder/. Conclusions: Our cFinder is implemented in an efficient algorithm that can be run on a low-performance desktop computer. Furthermore, it considers paired-end information (if available) and is generally open for any current next-generation sequencing technology and alignment strategy. To our knowledge, this is the first software that enables researchers without extensive bioinformatic support to designate multiple haplotypes and how they constitute to a sample.
AB - Background: Next-generation sequencing allows for determining the genetic composition of a mixed sample. For instance, when performing resistance testing for BCR-ABL1 it is necessary to identify clones and define compound mutations; together with an exact quantification this may complement diagnosis and therapy decisions with additional information. Moreover, that applies not only to oncological issues but also determination of viral, bacterial or fungal infection. The efforts to retrieve multiple haplotypes (more than two) and proportion information from data with conventional software are difficult, cumbersome and demand multiple manual steps. Results: Therefore, we developed a tool called cFinder that is capable of automatic detection of haplotypes and their accurate quantification within one sample. BCR-ABL1 samples containing multiple clones were used for testing and our cFinder could identify all previously found clones together with their abundance and even refine some results. Additionally, reads were simulated using GemSIM with multiple haplotypes, the detection was very close to linear (R2 = 0.96). Our aim is not to deduce haploblocks over statistics, but to characterize one sample's composition precisely. As a result the cFinder reports the connections of variants (haplotypes) with their readcount and relative occurrence (percentage). Download is available at http://sourceforge.net/projects/cfinder/. Conclusions: Our cFinder is implemented in an efficient algorithm that can be run on a low-performance desktop computer. Furthermore, it considers paired-end information (if available) and is generally open for any current next-generation sequencing technology and alignment strategy. To our knowledge, this is the first software that enables researchers without extensive bioinformatic support to designate multiple haplotypes and how they constitute to a sample.
KW - Clone quantification
KW - Haplotype identification
KW - Mixed sample
KW - Next-generation sequencing
KW - Software
KW - Reproducibility of Results
KW - Humans
KW - Computational Biology/methods
KW - Haplotypes/genetics
KW - Sequence Analysis, DNA/methods
KW - Genetic Variation
KW - Algorithms
KW - Sequence Alignment/methods
UR - http://www.scopus.com/inward/record.url?scp=84940840986&partnerID=8YFLogxK
U2 - 10.1186/s13104-015-1382-7
DO - 10.1186/s13104-015-1382-7
M3 - Article
C2 - 26346608
VL - 8
SP - 422
JO - BMC Research Notes
JF - BMC Research Notes
IS - 1
M1 - 422
ER -