High Dimensional Semiparametric Scale-Invariant Principal Component Analysis

Fang Han, Han Liu

Research output: Contribution to journalArticle

Abstract

We propose a new high dimensional semiparametric principal component analysis (PCA) method, named Copula Component Analysis (COCA). The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. COCA improves upon PCA and sparse PCA in three aspects: (i) It is robust to modeling assumptions; (ii) It is robust to outliers and data contamination; (iii) It is scale-invariant and yields more interpretable results. We prove that the COCA estimators obtain fast estimation rates and are feature selection consistent when the dimension is nearly exponentially large relative to the sample size. Careful experiments confirm that COCA outperforms sparse PCA on both synthetic and real-world data sets.

Original languageEnglish (US)
Article number6747357
Pages (from-to)2016-2032
Number of pages17
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
Volume36
Issue number10
DOIs
StatePublished - Oct 1 2014

Fingerprint

Scale Invariant
Copula
Principal component analysis
Principal Component Analysis
High-dimensional
Semiparametric Model
Contamination
Feature Selection
Outlier
Feature extraction
Monotone
Sample Size
Estimator
Modeling
Experiment
Experiments

Keywords

  • High dimensional statistics
  • Nonparanormal distribution
  • Principal component analysis
  • Robust statistics

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Vision and Pattern Recognition
  • Software
  • Computational Theory and Mathematics
  • Applied Mathematics

Cite this

High Dimensional Semiparametric Scale-Invariant Principal Component Analysis. / Han, Fang; Liu, Han.

In: IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 36, No. 10, 6747357, 01.10.2014, p. 2016-2032.

Research output: Contribution to journalArticle

@article{f16505109ef74a5081309129338d5771,
title = "High Dimensional Semiparametric Scale-Invariant Principal Component Analysis",
abstract = "We propose a new high dimensional semiparametric principal component analysis (PCA) method, named Copula Component Analysis (COCA). The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. COCA improves upon PCA and sparse PCA in three aspects: (i) It is robust to modeling assumptions; (ii) It is robust to outliers and data contamination; (iii) It is scale-invariant and yields more interpretable results. We prove that the COCA estimators obtain fast estimation rates and are feature selection consistent when the dimension is nearly exponentially large relative to the sample size. Careful experiments confirm that COCA outperforms sparse PCA on both synthetic and real-world data sets.",
keywords = "High dimensional statistics, Nonparanormal distribution, Principal component analysis, Robust statistics",
author = "Fang Han and Han Liu",
year = "2014",
month = "10",
day = "1",
doi = "10.1109/TPAMI.2014.2307886",
language = "English (US)",
volume = "36",
pages = "2016--2032",
journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",
issn = "0162-8828",
publisher = "IEEE Computer Society",
number = "10",

}

TY - JOUR

T1 - High Dimensional Semiparametric Scale-Invariant Principal Component Analysis

AU - Han, Fang

AU - Liu, Han

PY - 2014/10/1

Y1 - 2014/10/1

N2 - We propose a new high dimensional semiparametric principal component analysis (PCA) method, named Copula Component Analysis (COCA). The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. COCA improves upon PCA and sparse PCA in three aspects: (i) It is robust to modeling assumptions; (ii) It is robust to outliers and data contamination; (iii) It is scale-invariant and yields more interpretable results. We prove that the COCA estimators obtain fast estimation rates and are feature selection consistent when the dimension is nearly exponentially large relative to the sample size. Careful experiments confirm that COCA outperforms sparse PCA on both synthetic and real-world data sets.

AB - We propose a new high dimensional semiparametric principal component analysis (PCA) method, named Copula Component Analysis (COCA). The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. COCA improves upon PCA and sparse PCA in three aspects: (i) It is robust to modeling assumptions; (ii) It is robust to outliers and data contamination; (iii) It is scale-invariant and yields more interpretable results. We prove that the COCA estimators obtain fast estimation rates and are feature selection consistent when the dimension is nearly exponentially large relative to the sample size. Careful experiments confirm that COCA outperforms sparse PCA on both synthetic and real-world data sets.

KW - High dimensional statistics

KW - Nonparanormal distribution

KW - Principal component analysis

KW - Robust statistics

UR - http://www.scopus.com/inward/record.url?scp=84960128692&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84960128692&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2014.2307886

DO - 10.1109/TPAMI.2014.2307886

M3 - Article

C2 - 26352632

AN - SCOPUS:84960128692

VL - 36

SP - 2016

EP - 2032

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

SN - 0162-8828

IS - 10

M1 - 6747357

ER -