TY - GEN
T1 - Text mining for hypotheses and results in translational medicine studies
AU - Tsai, Terry H.
AU - Kasch, Niels
AU - Pfeifer, Craig
AU - Oates, Tim
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2015/1/26
Y1 - 2015/1/26
N2 - Most common and complex diseases, such as diabetes and cancer, are influenced at some level by variation in the genome. To truly address the goal of translational research, genetic variation must be taken into consideration. Research done in public health genetics, specifically in the area of single nucleotide polymorphisms (SNPs), is the first step to understanding human genetic variation. In addition, novel methods are needed to represent and to conduct text mining over textual genotypic data sources. In this paper, we describe the development and evaluation, in the context of a genetic study, of a translational-informatics method that supports both machine-learning text mining (e.g., Conditional random fields) and automated inference for identifying key concepts (e.g., Hypotheses and results). After scaling for inter-annotator agreement, our adjusted overall precision was 64%, with a range of 48% to 80%. While other biological text mining systems have focused on named-entity recognition, the development of tools for genetic studies focusing on hypotheses and results has been relatively rare.
AB - Most common and complex diseases, such as diabetes and cancer, are influenced at some level by variation in the genome. To truly address the goal of translational research, genetic variation must be taken into consideration. Research done in public health genetics, specifically in the area of single nucleotide polymorphisms (SNPs), is the first step to understanding human genetic variation. In addition, novel methods are needed to represent and to conduct text mining over textual genotypic data sources. In this paper, we describe the development and evaluation, in the context of a genetic study, of a translational-informatics method that supports both machine-learning text mining (e.g., Conditional random fields) and automated inference for identifying key concepts (e.g., Hypotheses and results). After scaling for inter-annotator agreement, our adjusted overall precision was 64%, with a range of 48% to 80%. While other biological text mining systems have focused on named-entity recognition, the development of tools for genetic studies focusing on hypotheses and results has been relatively rare.
KW - Biomedical informatics
KW - Gene-environment interaction studies
KW - Natural language processing
KW - Text mining
KW - Translational informatics
UR - http://www.scopus.com/inward/record.url?scp=84936884510&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84936884510&partnerID=8YFLogxK
U2 - 10.1109/ICDMW.2014.39
DO - 10.1109/ICDMW.2014.39
M3 - Conference contribution
AN - SCOPUS:84936884510
T3 - IEEE International Conference on Data Mining Workshops, ICDMW
SP - 127
EP - 132
BT - Proceedings - 14th IEEE International Conference on Data Mining Workshops, ICDMW 2014
A2 - Zhou, Zhi-Hua
A2 - Wang, Wei
A2 - Kumar, Ravi
A2 - Toivonen, Hannu
A2 - Pei, Jian
A2 - Zhexue Huang, Joshua
A2 - Wu, Xindong
PB - IEEE Computer Society
T2 - 14th IEEE International Conference on Data Mining Workshops, ICDMW 2014
Y2 - 14 December 2014
ER -