Metaprotein expression modeling for label-free quantitative proteomics

Joseph E. Lucas; J. W. Thompson; Laura G. Dubois; Jeanette McCarthy; Hans Tillmann; Alexander Thompson; Norah Shire; Ron Hendrickson; Francisco Dieguez; Phyllis Goldman; Kathleen Schwarz; Keyur Patel; John McHutchison; M. A. Moseley

doi:10.1186/1471-2105-13-74

Metaprotein expression modeling for label-free quantitative proteomics

Joseph E. Lucas, J. W. Thompson, Laura G. Dubois, Jeanette McCarthy, Hans Tillmann, Alexander Thompson, Norah Shire, Ron Hendrickson, Francisco Dieguez, Phyllis Goldman, Kathleen Schwarz, Keyur Patel, John McHutchison, M. A. Moseley

School of Medicine

Research output: Contribution to journal › Article › peer-review

5 Scopus citations

Abstract

Background: Label-free quantitative proteomics holds a great deal of promise for the future study of both medicine and biology. However, the data generated is extremely intricate in its correlation structure, and its proper analysis is complex. There are issues with missing identifications. There are high levels of correlation between many, but not all, of the peptides derived from the same protein. Additionally, there may be systematic shifts in the sensitivity of the machine between experiments or even through time within the duration of a single experiment.Results: We describe a hierarchical model for analyzing unbiased, label-free proteomics data which utilizes the covariance of peptide expression across samples as well as MS/MS-based identifications to group peptides-a strategy we call metaprotein expression modeling. Our metaprotein model acknowledges the possibility of misidentifications, post-translational modifications and systematic differences between samples due to changes in instrument sensitivity or differences in total protein concentration. In addition, our approach allows us to validate findings from unbiased, label-free proteomics experiments with further unbiased, label-free proteomics experiments. Finally, we demonstrate the clinical/translational utility of the model for building predictors capable of differentiating biological phenotypes as well as for validating those findings in the context of three novel cohorts of patients with Hepatitis C.Conclusions: Mass-spectrometry proteomics is quickly becoming a powerful tool for studying biological and translational questions. Making use of all of the information contained in a particular set of data will be critical to the success of those endeavors. Our proposed model represents an advance in the ability of statistical models of proteomic data to identify and utilize correlation between features. This allows validation of predictors without translation to targeted assays in addition to informing the choice of targets when it is appropriate to generate those assays.

Original language	English (US)
Article number	74
Journal	BMC Bioinformatics
Volume	13
Issue number	1
DOIs	https://doi.org/10.1186/1471-2105-13-74
State	Published - May 4 2012

Keywords

Factor
Hepatitis
Mrm
Open platform
Proteomics
Srm
Statistical model
Statistics

ASJC Scopus subject areas

Structural Biology
Biochemistry
Molecular Biology
Computer Science Applications
Applied Mathematics

Access to Document

10.1186/1471-2105-13-74

Cite this

@article{a129f29f11084179a4efff650df668e1,

title = "Metaprotein expression modeling for label-free quantitative proteomics",

abstract = "Background: Label-free quantitative proteomics holds a great deal of promise for the future study of both medicine and biology. However, the data generated is extremely intricate in its correlation structure, and its proper analysis is complex. There are issues with missing identifications. There are high levels of correlation between many, but not all, of the peptides derived from the same protein. Additionally, there may be systematic shifts in the sensitivity of the machine between experiments or even through time within the duration of a single experiment.Results: We describe a hierarchical model for analyzing unbiased, label-free proteomics data which utilizes the covariance of peptide expression across samples as well as MS/MS-based identifications to group peptides-a strategy we call metaprotein expression modeling. Our metaprotein model acknowledges the possibility of misidentifications, post-translational modifications and systematic differences between samples due to changes in instrument sensitivity or differences in total protein concentration. In addition, our approach allows us to validate findings from unbiased, label-free proteomics experiments with further unbiased, label-free proteomics experiments. Finally, we demonstrate the clinical/translational utility of the model for building predictors capable of differentiating biological phenotypes as well as for validating those findings in the context of three novel cohorts of patients with Hepatitis C.Conclusions: Mass-spectrometry proteomics is quickly becoming a powerful tool for studying biological and translational questions. Making use of all of the information contained in a particular set of data will be critical to the success of those endeavors. Our proposed model represents an advance in the ability of statistical models of proteomic data to identify and utilize correlation between features. This allows validation of predictors without translation to targeted assays in addition to informing the choice of targets when it is appropriate to generate those assays.",

keywords = "Factor, Hepatitis, Mrm, Open platform, Proteomics, Srm, Statistical model, Statistics",

author = "Lucas, {Joseph E.} and Thompson, {J. W.} and Dubois, {Laura G.} and Jeanette McCarthy and Hans Tillmann and Alexander Thompson and Norah Shire and Ron Hendrickson and Francisco Dieguez and Phyllis Goldman and Kathleen Schwarz and Keyur Patel and John McHutchison and Moseley, {M. A.}",

note = "Funding Information: Supported in part by Duke University{\textquoteright}s CTSA grant 1 UL1 RR024128-01 from NCRR/NIH. Supported in part by a gift from David H. Murdock. We gratefully acknowledge Waters Corporation and Rosetta Biosoftware, Inc for hardware and software support for the data presented in this manuscript. In addition, we would like to acknowledge the PEDS C Clinical Research Network, the NIDDK grant U01-DK-067767 and Roche Pharmaceuticals, Inc for the collection of the pediatric HCV samples.",

year = "2012",

month = may,

day = "4",

doi = "10.1186/1471-2105-13-74",

language = "English (US)",

volume = "13",

journal = "BMC Bioinformatics",

issn = "1471-2105",

publisher = "BioMed Central",

number = "1",

}

TY - JOUR

T1 - Metaprotein expression modeling for label-free quantitative proteomics

AU - Lucas, Joseph E.

AU - Thompson, J. W.

AU - Dubois, Laura G.

AU - McCarthy, Jeanette

AU - Tillmann, Hans

AU - Thompson, Alexander

AU - Shire, Norah

AU - Hendrickson, Ron

AU - Dieguez, Francisco

AU - Goldman, Phyllis

AU - Schwarz, Kathleen

AU - Patel, Keyur

AU - McHutchison, John

AU - Moseley, M. A.

N1 - Funding Information: Supported in part by Duke University’s CTSA grant 1 UL1 RR024128-01 from NCRR/NIH. Supported in part by a gift from David H. Murdock. We gratefully acknowledge Waters Corporation and Rosetta Biosoftware, Inc for hardware and software support for the data presented in this manuscript. In addition, we would like to acknowledge the PEDS C Clinical Research Network, the NIDDK grant U01-DK-067767 and Roche Pharmaceuticals, Inc for the collection of the pediatric HCV samples.

PY - 2012/5/4

Y1 - 2012/5/4

N2 - Background: Label-free quantitative proteomics holds a great deal of promise for the future study of both medicine and biology. However, the data generated is extremely intricate in its correlation structure, and its proper analysis is complex. There are issues with missing identifications. There are high levels of correlation between many, but not all, of the peptides derived from the same protein. Additionally, there may be systematic shifts in the sensitivity of the machine between experiments or even through time within the duration of a single experiment.Results: We describe a hierarchical model for analyzing unbiased, label-free proteomics data which utilizes the covariance of peptide expression across samples as well as MS/MS-based identifications to group peptides-a strategy we call metaprotein expression modeling. Our metaprotein model acknowledges the possibility of misidentifications, post-translational modifications and systematic differences between samples due to changes in instrument sensitivity or differences in total protein concentration. In addition, our approach allows us to validate findings from unbiased, label-free proteomics experiments with further unbiased, label-free proteomics experiments. Finally, we demonstrate the clinical/translational utility of the model for building predictors capable of differentiating biological phenotypes as well as for validating those findings in the context of three novel cohorts of patients with Hepatitis C.Conclusions: Mass-spectrometry proteomics is quickly becoming a powerful tool for studying biological and translational questions. Making use of all of the information contained in a particular set of data will be critical to the success of those endeavors. Our proposed model represents an advance in the ability of statistical models of proteomic data to identify and utilize correlation between features. This allows validation of predictors without translation to targeted assays in addition to informing the choice of targets when it is appropriate to generate those assays.

AB - Background: Label-free quantitative proteomics holds a great deal of promise for the future study of both medicine and biology. However, the data generated is extremely intricate in its correlation structure, and its proper analysis is complex. There are issues with missing identifications. There are high levels of correlation between many, but not all, of the peptides derived from the same protein. Additionally, there may be systematic shifts in the sensitivity of the machine between experiments or even through time within the duration of a single experiment.Results: We describe a hierarchical model for analyzing unbiased, label-free proteomics data which utilizes the covariance of peptide expression across samples as well as MS/MS-based identifications to group peptides-a strategy we call metaprotein expression modeling. Our metaprotein model acknowledges the possibility of misidentifications, post-translational modifications and systematic differences between samples due to changes in instrument sensitivity or differences in total protein concentration. In addition, our approach allows us to validate findings from unbiased, label-free proteomics experiments with further unbiased, label-free proteomics experiments. Finally, we demonstrate the clinical/translational utility of the model for building predictors capable of differentiating biological phenotypes as well as for validating those findings in the context of three novel cohorts of patients with Hepatitis C.Conclusions: Mass-spectrometry proteomics is quickly becoming a powerful tool for studying biological and translational questions. Making use of all of the information contained in a particular set of data will be critical to the success of those endeavors. Our proposed model represents an advance in the ability of statistical models of proteomic data to identify and utilize correlation between features. This allows validation of predictors without translation to targeted assays in addition to informing the choice of targets when it is appropriate to generate those assays.

KW - Factor

KW - Hepatitis

KW - Mrm

KW - Open platform

KW - Proteomics

KW - Srm

KW - Statistical model

KW - Statistics

UR - http://www.scopus.com/inward/record.url?scp=84860543314&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84860543314&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-13-74

DO - 10.1186/1471-2105-13-74

M3 - Article

C2 - 22559859

AN - SCOPUS:84860543314

SN - 1471-2105

VL - 13

JO - BMC Bioinformatics

JF - BMC Bioinformatics

IS - 1

M1 - 74

ER -

Metaprotein expression modeling for label-free quantitative proteomics

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this