Text mining for hypotheses and results in translational medicine studies

Terry H. Tsai; Niels Kasch; Craig Pfeifer; Tim Oates

doi:10.1109/ICDMW.2014.39

Text mining for hypotheses and results in translational medicine studies

Terry H. Tsai, Niels Kasch, Craig Pfeifer, Tim Oates

School of Medicine

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

Most common and complex diseases, such as diabetes and cancer, are influenced at some level by variation in the genome. To truly address the goal of translational research, genetic variation must be taken into consideration. Research done in public health genetics, specifically in the area of single nucleotide polymorphisms (SNPs), is the first step to understanding human genetic variation. In addition, novel methods are needed to represent and to conduct text mining over textual genotypic data sources. In this paper, we describe the development and evaluation, in the context of a genetic study, of a translational-informatics method that supports both machine-learning text mining (e.g., Conditional random fields) and automated inference for identifying key concepts (e.g., Hypotheses and results). After scaling for inter-annotator agreement, our adjusted overall precision was 64%, with a range of 48% to 80%. While other biological text mining systems have focused on named-entity recognition, the development of tools for genetic studies focusing on hypotheses and results has been relatively rare.

Original language	English (US)
Title of host publication	Proceedings - 14th IEEE International Conference on Data Mining Workshops, ICDMW 2014
Editors	Zhi-Hua Zhou, Wei Wang, Ravi Kumar, Hannu Toivonen, Jian Pei, Joshua Zhexue Huang, Xindong Wu
Publisher	IEEE Computer Society
Pages	127-132
Number of pages	6
Edition	January
ISBN (Electronic)	9781479942749
DOIs	https://doi.org/10.1109/ICDMW.2014.39
State	Published - Jan 26 2015
Event	14th IEEE International Conference on Data Mining Workshops, ICDMW 2014 - Shenzhen, China Duration: Dec 14 2014 → …

Publication series

Name	IEEE International Conference on Data Mining Workshops, ICDMW
Number	January
Volume	2015-January
ISSN (Print)	2375-9232
ISSN (Electronic)	2375-9259

Conference

Conference	14th IEEE International Conference on Data Mining Workshops, ICDMW 2014
Country/Territory	China
City	Shenzhen
Period	12/14/14 → …

Keywords

Biomedical informatics
Gene-environment interaction studies
Natural language processing
Text mining
Translational informatics

ASJC Scopus subject areas

Computer Science Applications
Software

Access to Document

10.1109/ICDMW.2014.39

Cite this

Tsai, T. H., Kasch, N., Pfeifer, C., & Oates, T. (2015). Text mining for hypotheses and results in translational medicine studies. In Z.-H. Zhou, W. Wang, R. Kumar, H. Toivonen, J. Pei, J. Zhexue Huang, & X. Wu (Eds.), Proceedings - 14th IEEE International Conference on Data Mining Workshops, ICDMW 2014 (January ed., pp. 127-132). Article 7022589 (IEEE International Conference on Data Mining Workshops, ICDMW; Vol. 2015-January, No. January). IEEE Computer Society. https://doi.org/10.1109/ICDMW.2014.39

Text mining for hypotheses and results in translational medicine studies. / Tsai, Terry H.; Kasch, Niels; Pfeifer, Craig et al.
Proceedings - 14th IEEE International Conference on Data Mining Workshops, ICDMW 2014. ed. / Zhi-Hua Zhou; Wei Wang; Ravi Kumar; Hannu Toivonen; Jian Pei; Joshua Zhexue Huang; Xindong Wu. January. ed. IEEE Computer Society, 2015. p. 127-132 7022589 (IEEE International Conference on Data Mining Workshops, ICDMW; Vol. 2015-January, No. January).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Tsai, TH, Kasch, N, Pfeifer, C & Oates, T 2015, Text mining for hypotheses and results in translational medicine studies. in Z-H Zhou, W Wang, R Kumar, H Toivonen, J Pei, J Zhexue Huang & X Wu (eds), Proceedings - 14th IEEE International Conference on Data Mining Workshops, ICDMW 2014. January edn, 7022589, IEEE International Conference on Data Mining Workshops, ICDMW, no. January, vol. 2015-January, IEEE Computer Society, pp. 127-132, 14th IEEE International Conference on Data Mining Workshops, ICDMW 2014, Shenzhen, China, 12/14/14. https://doi.org/10.1109/ICDMW.2014.39

Tsai TH, Kasch N, Pfeifer C, Oates T. Text mining for hypotheses and results in translational medicine studies. In Zhou ZH, Wang W, Kumar R, Toivonen H, Pei J, Zhexue Huang J, Wu X, editors, Proceedings - 14th IEEE International Conference on Data Mining Workshops, ICDMW 2014. January ed. IEEE Computer Society. 2015. p. 127-132. 7022589. (IEEE International Conference on Data Mining Workshops, ICDMW; January). doi: 10.1109/ICDMW.2014.39

Tsai, Terry H. ; Kasch, Niels ; Pfeifer, Craig et al. / Text mining for hypotheses and results in translational medicine studies. Proceedings - 14th IEEE International Conference on Data Mining Workshops, ICDMW 2014. editor / Zhi-Hua Zhou ; Wei Wang ; Ravi Kumar ; Hannu Toivonen ; Jian Pei ; Joshua Zhexue Huang ; Xindong Wu. January. ed. IEEE Computer Society, 2015. pp. 127-132 (IEEE International Conference on Data Mining Workshops, ICDMW; January).

@inproceedings{6cf177288b3d44abb60af22da9911e95,

title = "Text mining for hypotheses and results in translational medicine studies",

abstract = "Most common and complex diseases, such as diabetes and cancer, are influenced at some level by variation in the genome. To truly address the goal of translational research, genetic variation must be taken into consideration. Research done in public health genetics, specifically in the area of single nucleotide polymorphisms (SNPs), is the first step to understanding human genetic variation. In addition, novel methods are needed to represent and to conduct text mining over textual genotypic data sources. In this paper, we describe the development and evaluation, in the context of a genetic study, of a translational-informatics method that supports both machine-learning text mining (e.g., Conditional random fields) and automated inference for identifying key concepts (e.g., Hypotheses and results). After scaling for inter-annotator agreement, our adjusted overall precision was 64%, with a range of 48% to 80%. While other biological text mining systems have focused on named-entity recognition, the development of tools for genetic studies focusing on hypotheses and results has been relatively rare.",

keywords = "Biomedical informatics, Gene-environment interaction studies, Natural language processing, Text mining, Translational informatics",

author = "Tsai, {Terry H.} and Niels Kasch and Craig Pfeifer and Tim Oates",

note = "Publisher Copyright: {\textcopyright} 2014 IEEE.; 14th IEEE International Conference on Data Mining Workshops, ICDMW 2014 ; Conference date: 14-12-2014",

year = "2015",

month = jan,

day = "26",

doi = "10.1109/ICDMW.2014.39",

language = "English (US)",

series = "IEEE International Conference on Data Mining Workshops, ICDMW",

publisher = "IEEE Computer Society",

number = "January",

pages = "127--132",

editor = "Zhi-Hua Zhou and Wei Wang and Ravi Kumar and Hannu Toivonen and Jian Pei and {Zhexue Huang}, Joshua and Xindong Wu",

booktitle = "Proceedings - 14th IEEE International Conference on Data Mining Workshops, ICDMW 2014",

edition = "January",

}

TY - GEN

T1 - Text mining for hypotheses and results in translational medicine studies

AU - Tsai, Terry H.

AU - Kasch, Niels

AU - Pfeifer, Craig

AU - Oates, Tim

PY - 2015/1/26

Y1 - 2015/1/26

N2 - Most common and complex diseases, such as diabetes and cancer, are influenced at some level by variation in the genome. To truly address the goal of translational research, genetic variation must be taken into consideration. Research done in public health genetics, specifically in the area of single nucleotide polymorphisms (SNPs), is the first step to understanding human genetic variation. In addition, novel methods are needed to represent and to conduct text mining over textual genotypic data sources. In this paper, we describe the development and evaluation, in the context of a genetic study, of a translational-informatics method that supports both machine-learning text mining (e.g., Conditional random fields) and automated inference for identifying key concepts (e.g., Hypotheses and results). After scaling for inter-annotator agreement, our adjusted overall precision was 64%, with a range of 48% to 80%. While other biological text mining systems have focused on named-entity recognition, the development of tools for genetic studies focusing on hypotheses and results has been relatively rare.

AB - Most common and complex diseases, such as diabetes and cancer, are influenced at some level by variation in the genome. To truly address the goal of translational research, genetic variation must be taken into consideration. Research done in public health genetics, specifically in the area of single nucleotide polymorphisms (SNPs), is the first step to understanding human genetic variation. In addition, novel methods are needed to represent and to conduct text mining over textual genotypic data sources. In this paper, we describe the development and evaluation, in the context of a genetic study, of a translational-informatics method that supports both machine-learning text mining (e.g., Conditional random fields) and automated inference for identifying key concepts (e.g., Hypotheses and results). After scaling for inter-annotator agreement, our adjusted overall precision was 64%, with a range of 48% to 80%. While other biological text mining systems have focused on named-entity recognition, the development of tools for genetic studies focusing on hypotheses and results has been relatively rare.

KW - Biomedical informatics

KW - Gene-environment interaction studies

KW - Natural language processing

KW - Text mining

KW - Translational informatics

UR - http://www.scopus.com/inward/record.url?scp=84936884510&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84936884510&partnerID=8YFLogxK

U2 - 10.1109/ICDMW.2014.39

DO - 10.1109/ICDMW.2014.39

M3 - Conference contribution

AN - SCOPUS:84936884510

T3 - IEEE International Conference on Data Mining Workshops, ICDMW

SP - 127

EP - 132

BT - Proceedings - 14th IEEE International Conference on Data Mining Workshops, ICDMW 2014

A2 - Zhou, Zhi-Hua

A2 - Wang, Wei

A2 - Kumar, Ravi

A2 - Toivonen, Hannu

A2 - Pei, Jian

A2 - Zhexue Huang, Joshua

A2 - Wu, Xindong

PB - IEEE Computer Society

T2 - 14th IEEE International Conference on Data Mining Workshops, ICDMW 2014

Y2 - 14 December 2014

ER -

Text mining for hypotheses and results in translational medicine studies

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this