Quantitative analysis of literary styles

Roger Peng, Nicolas W. Hengartner

Research output: Contribution to journalArticle

Abstract

Writers are often viewed as having an inherent style that can serve as a literary fingerprint. By quantifying relevant features related to literary style, one may hope to classify written works and even attribute authorship to newly discovered texts. Beyond its intrinsic interest, the study of literary styles presents the opportunity to introduce and motivate many standard multivariate statistical techniques. Today the statistical analysis of literary styles is made much simpler by the wealth of real data readily available from the Internet. This article presents an overview and brief history of the analysis of literary styles. In addition we use canonical discriminant analyis and principal component analysis to identify structure in the data and distinguish authorship.

Original languageEnglish (US)
Pages (from-to)175-185
Number of pages11
JournalAmerican Statistician
Volume56
Issue number3
DOIs
StatePublished - Aug 2002
Externally publishedYes

Fingerprint

Quantitative Analysis
Fingerprint
Discriminant
Principal Component Analysis
Statistical Analysis
Classify
Attribute
Style
Quantitative analysis
Authorship

Keywords

  • Authorship
  • Canonical discriminant analysis
  • Data visualization
  • Function words
  • High-dimensional data
  • Principal component analysis

ASJC Scopus subject areas

  • Mathematics(all)
  • Statistics and Probability

Cite this

Quantitative analysis of literary styles. / Peng, Roger; Hengartner, Nicolas W.

In: American Statistician, Vol. 56, No. 3, 08.2002, p. 175-185.

Research output: Contribution to journalArticle

Peng, Roger ; Hengartner, Nicolas W. / Quantitative analysis of literary styles. In: American Statistician. 2002 ; Vol. 56, No. 3. pp. 175-185.
@article{990ecf0b8b1444968e5603b5af885f7a,
title = "Quantitative analysis of literary styles",
abstract = "Writers are often viewed as having an inherent style that can serve as a literary fingerprint. By quantifying relevant features related to literary style, one may hope to classify written works and even attribute authorship to newly discovered texts. Beyond its intrinsic interest, the study of literary styles presents the opportunity to introduce and motivate many standard multivariate statistical techniques. Today the statistical analysis of literary styles is made much simpler by the wealth of real data readily available from the Internet. This article presents an overview and brief history of the analysis of literary styles. In addition we use canonical discriminant analyis and principal component analysis to identify structure in the data and distinguish authorship.",
keywords = "Authorship, Canonical discriminant analysis, Data visualization, Function words, High-dimensional data, Principal component analysis",
author = "Roger Peng and Hengartner, {Nicolas W.}",
year = "2002",
month = "8",
doi = "10.1198/000313002100",
language = "English (US)",
volume = "56",
pages = "175--185",
journal = "American Statistician",
issn = "0003-1305",
publisher = "American Statistical Association",
number = "3",

}

TY - JOUR

T1 - Quantitative analysis of literary styles

AU - Peng, Roger

AU - Hengartner, Nicolas W.

PY - 2002/8

Y1 - 2002/8

N2 - Writers are often viewed as having an inherent style that can serve as a literary fingerprint. By quantifying relevant features related to literary style, one may hope to classify written works and even attribute authorship to newly discovered texts. Beyond its intrinsic interest, the study of literary styles presents the opportunity to introduce and motivate many standard multivariate statistical techniques. Today the statistical analysis of literary styles is made much simpler by the wealth of real data readily available from the Internet. This article presents an overview and brief history of the analysis of literary styles. In addition we use canonical discriminant analyis and principal component analysis to identify structure in the data and distinguish authorship.

AB - Writers are often viewed as having an inherent style that can serve as a literary fingerprint. By quantifying relevant features related to literary style, one may hope to classify written works and even attribute authorship to newly discovered texts. Beyond its intrinsic interest, the study of literary styles presents the opportunity to introduce and motivate many standard multivariate statistical techniques. Today the statistical analysis of literary styles is made much simpler by the wealth of real data readily available from the Internet. This article presents an overview and brief history of the analysis of literary styles. In addition we use canonical discriminant analyis and principal component analysis to identify structure in the data and distinguish authorship.

KW - Authorship

KW - Canonical discriminant analysis

KW - Data visualization

KW - Function words

KW - High-dimensional data

KW - Principal component analysis

UR - http://www.scopus.com/inward/record.url?scp=0036678352&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0036678352&partnerID=8YFLogxK

U2 - 10.1198/000313002100

DO - 10.1198/000313002100

M3 - Article

AN - SCOPUS:0036678352

VL - 56

SP - 175

EP - 185

JO - American Statistician

JF - American Statistician

SN - 0003-1305

IS - 3

ER -