TY - JOUR
T1 - Towards large-scale sample annotation in gene expression repositories
AU - Pitzer, Erik
AU - Lacson, Ronilda
AU - Hinske, Christian
AU - Galante, Pedro
AU - Kim, Jihoon
AU - Ohno-Machado, Lucila
N1 - Funding Information:
The students performing the annotation where Pierre Cornell, Karrie Du, Evelyn Pitzer, Lindy Su, and Anthony Villanova. This research was funded by grant FAS0703850 from the Komen Foundation and D43TW007015 from the Fogarty International Center, NIH.
PY - 2009/9/17
Y1 - 2009/9/17
N2 - Background: Large repositories of biomedical research data are most useful to translational researchers if their data can be aggregated for efficient queries and analyses. However, inconsistent or non-existent annotations describing important sample details such as name of tissue or cell line, histopathological type, and subject characteristics like demographics, treatment, and survival are seldom present in data repositories, making it difficult to aggregate data. Results: We created a flexible software tool that allows efficient annotation of samples using a controlled vocabulary, and report on its use for the annotation of over 12,500 samples. Conclusion: While the amount of data is very large and seemingly poorly annotated, a lot of information is still within reach. Consistent tool-based re-annotation enables many new possibilities for large scale interpretation and analyses that would otherwise be impossible.
AB - Background: Large repositories of biomedical research data are most useful to translational researchers if their data can be aggregated for efficient queries and analyses. However, inconsistent or non-existent annotations describing important sample details such as name of tissue or cell line, histopathological type, and subject characteristics like demographics, treatment, and survival are seldom present in data repositories, making it difficult to aggregate data. Results: We created a flexible software tool that allows efficient annotation of samples using a controlled vocabulary, and report on its use for the annotation of over 12,500 samples. Conclusion: While the amount of data is very large and seemingly poorly annotated, a lot of information is still within reach. Consistent tool-based re-annotation enables many new possibilities for large scale interpretation and analyses that would otherwise be impossible.
KW - Computational Biology/methods
KW - Databases, Genetic
KW - Gene Expression Profiling/methods
KW - Software
KW - Vocabulary, Controlled
UR - http://www.scopus.com/inward/record.url?scp=70349884362&partnerID=8YFLogxK
U2 - 10.1186/1471-2105-10-S9-S9
DO - 10.1186/1471-2105-10-S9-S9
M3 - Article
C2 - 19761579
VL - 10
SP - S9
JO - BMC Bioinformatics
JF - BMC Bioinformatics
IS - SUPPL. 9
M1 - S9
ER -