High Dimensional Semiparametric Scale-Invariant Principal Component Analysis

Fang Han, Han Liu

Research output: Contribution to journalArticle

Abstract

We propose a new high dimensional semiparametric principal component analysis (PCA) method, named Copula Component Analysis (COCA). The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. COCA improves upon PCA and sparse PCA in three aspects: (i) It is robust to modeling assumptions; (ii) It is robust to outliers and data contamination; (iii) It is scale-invariant and yields more interpretable results. We prove that the COCA estimators obtain fast estimation rates and are feature selection consistent when the dimension is nearly exponentially large relative to the sample size. Careful experiments confirm that COCA outperforms sparse PCA on both synthetic and real-world data sets.

Original languageEnglish (US)
Article number6747357
Pages (from-to)2016-2032
Number of pages17
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
Volume36
Issue number10
DOIs
StatePublished - Oct 2014

Keywords

  • High dimensional statistics
  • Nonparanormal distribution
  • Principal component analysis
  • Robust statistics

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition
  • Computational Theory and Mathematics
  • Artificial Intelligence
  • Applied Mathematics

Fingerprint Dive into the research topics of 'High Dimensional Semiparametric Scale-Invariant Principal Component Analysis'. Together they form a unique fingerprint.

  • Cite this