Classifying web videos using a global video descriptor

Berkan Solmaz, Shayan Modiri Assari, Mubarak Shah

Research output: Contribution to journalArticle

Abstract

Computing descriptors for videos is a crucial task in computer vision. In this paper, we propose a global video descriptor for classification of videos. Our method, bypasses the detection of interest points, the extraction of local video descriptors and the quantization of descriptors into a code book; it represents each video sequence as a single feature vector. Our global descriptor is computed by applying a bank of 3-D spatio-temporal filters on the frequency spectrum of a video sequence; hence, it integrates the information about the motion and scene structure. We tested our approach on three datasets, KTH (Schuldt et al.; Proceedings of the 17th international conference on, pattern recognition (ICPR'04), vol. 3, pp. 32-36, 2004), UCF50 (http://vision.eecs.ucf. edu/datasetsActions.html) and HMDB51 (Kuehne et al.; HMDB: a large video database for human motion recognition, 2011), and obtained promising results which demonstrate the robustness and the discriminative power of our global video descriptor for classifying videos of various actions. In addition, the combination of our global descriptor and a local descriptor resulted in the highest classification accuracies on UCF50 and HMDB51 datasets.

Original languageEnglish (US)
Pages (from-to)1473-1485
Number of pages13
JournalMachine Vision and Applications
Volume24
Issue number7
DOIs
StatePublished - Jan 1 2013
Externally publishedYes

Fingerprint

Computer vision
Pattern recognition

Keywords

  • Action recognition
  • Frequency spectrum
  • Spatio-temporal analysis
  • Video descriptors

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Cite this

Classifying web videos using a global video descriptor. / Solmaz, Berkan; Assari, Shayan Modiri; Shah, Mubarak.

In: Machine Vision and Applications, Vol. 24, No. 7, 01.01.2013, p. 1473-1485.

Research output: Contribution to journalArticle

Solmaz, Berkan ; Assari, Shayan Modiri ; Shah, Mubarak. / Classifying web videos using a global video descriptor. In: Machine Vision and Applications. 2013 ; Vol. 24, No. 7. pp. 1473-1485.
@article{d3c429b9d4474a63892882117579cdec,
title = "Classifying web videos using a global video descriptor",
abstract = "Computing descriptors for videos is a crucial task in computer vision. In this paper, we propose a global video descriptor for classification of videos. Our method, bypasses the detection of interest points, the extraction of local video descriptors and the quantization of descriptors into a code book; it represents each video sequence as a single feature vector. Our global descriptor is computed by applying a bank of 3-D spatio-temporal filters on the frequency spectrum of a video sequence; hence, it integrates the information about the motion and scene structure. We tested our approach on three datasets, KTH (Schuldt et al.; Proceedings of the 17th international conference on, pattern recognition (ICPR'04), vol. 3, pp. 32-36, 2004), UCF50 (http://vision.eecs.ucf. edu/datasetsActions.html) and HMDB51 (Kuehne et al.; HMDB: a large video database for human motion recognition, 2011), and obtained promising results which demonstrate the robustness and the discriminative power of our global video descriptor for classifying videos of various actions. In addition, the combination of our global descriptor and a local descriptor resulted in the highest classification accuracies on UCF50 and HMDB51 datasets.",
keywords = "Action recognition, Frequency spectrum, Spatio-temporal analysis, Video descriptors",
author = "Berkan Solmaz and Assari, {Shayan Modiri} and Mubarak Shah",
year = "2013",
month = "1",
day = "1",
doi = "10.1007/s00138-012-0449-x",
language = "English (US)",
volume = "24",
pages = "1473--1485",
journal = "Machine Vision and Applications",
issn = "0932-8092",
publisher = "Springer Verlag",
number = "7",

}

TY - JOUR

T1 - Classifying web videos using a global video descriptor

AU - Solmaz, Berkan

AU - Assari, Shayan Modiri

AU - Shah, Mubarak

PY - 2013/1/1

Y1 - 2013/1/1

N2 - Computing descriptors for videos is a crucial task in computer vision. In this paper, we propose a global video descriptor for classification of videos. Our method, bypasses the detection of interest points, the extraction of local video descriptors and the quantization of descriptors into a code book; it represents each video sequence as a single feature vector. Our global descriptor is computed by applying a bank of 3-D spatio-temporal filters on the frequency spectrum of a video sequence; hence, it integrates the information about the motion and scene structure. We tested our approach on three datasets, KTH (Schuldt et al.; Proceedings of the 17th international conference on, pattern recognition (ICPR'04), vol. 3, pp. 32-36, 2004), UCF50 (http://vision.eecs.ucf. edu/datasetsActions.html) and HMDB51 (Kuehne et al.; HMDB: a large video database for human motion recognition, 2011), and obtained promising results which demonstrate the robustness and the discriminative power of our global video descriptor for classifying videos of various actions. In addition, the combination of our global descriptor and a local descriptor resulted in the highest classification accuracies on UCF50 and HMDB51 datasets.

AB - Computing descriptors for videos is a crucial task in computer vision. In this paper, we propose a global video descriptor for classification of videos. Our method, bypasses the detection of interest points, the extraction of local video descriptors and the quantization of descriptors into a code book; it represents each video sequence as a single feature vector. Our global descriptor is computed by applying a bank of 3-D spatio-temporal filters on the frequency spectrum of a video sequence; hence, it integrates the information about the motion and scene structure. We tested our approach on three datasets, KTH (Schuldt et al.; Proceedings of the 17th international conference on, pattern recognition (ICPR'04), vol. 3, pp. 32-36, 2004), UCF50 (http://vision.eecs.ucf. edu/datasetsActions.html) and HMDB51 (Kuehne et al.; HMDB: a large video database for human motion recognition, 2011), and obtained promising results which demonstrate the robustness and the discriminative power of our global video descriptor for classifying videos of various actions. In addition, the combination of our global descriptor and a local descriptor resulted in the highest classification accuracies on UCF50 and HMDB51 datasets.

KW - Action recognition

KW - Frequency spectrum

KW - Spatio-temporal analysis

KW - Video descriptors

UR - http://www.scopus.com/inward/record.url?scp=84885330892&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84885330892&partnerID=8YFLogxK

U2 - 10.1007/s00138-012-0449-x

DO - 10.1007/s00138-012-0449-x

M3 - Article

AN - SCOPUS:84885330892

VL - 24

SP - 1473

EP - 1485

JO - Machine Vision and Applications

JF - Machine Vision and Applications

SN - 0932-8092

IS - 7

ER -