MRCNN: A stateful Fast R-CNN: Using temporal consistency in R-CNN for video object localization and classification

Philippe Burlina

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Deep convolutional neural networks (DCNNs) perform on par or better than humans for image classification. Hence efforts have now shifted to more challenging tasks such as object detection and classification in images, video or RGBD. Recently developed region CNNs (R-CNN) such as Fast R-CNN [7] address this detection task for images. Instead, this paper is concerned with video and also focuses on resource-limited systems. Newly proposed methods accelerate R-CNN by sharing convolutional layers for proposal generation, location regression and labeling [12][13][19][25]. These approaches when applied to video are stateless: they process each image individually. This suggests an alternate route: to make R-CNN stateful and exploit temporal consistency. We extend Fast R-CNNs by making it employ recursive Bayesian filtering and perform proposal propagation and reuse. We couple multi-target proposal/detection tracking (MTT) with R-CNN and do detection-to-track association. We call this approach MRCNN as short for MTT + R-CNN. In MRCNN, region proposals that are vetted via classification and regression in R-CNNs - are treated as observations in MTT and propagated using assumed kinematics. Actual proposal generation (e.g. via Selective Search) need only be performed sporadically and/or periodically and is replaced at all other times by MTT proposal predictions. Preliminary results show that MRCNNs can economize on both proposal and classification computations, and can yield up to a 10 to 30 factor decrease in number of proposals generated, about one order of magnitude proposal computation time savings and nearly one order magnitude improvement in overall computational time savings, for comparable localization and classification performance. This method can additionally be beneficial for false alarm abatement.

Original languageEnglish (US)
Title of host publication2016 23rd International Conference on Pattern Recognition, ICPR 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3518-3523
Number of pages6
ISBN (Electronic)9781509048472
DOIs
StatePublished - Apr 13 2017
Event23rd International Conference on Pattern Recognition, ICPR 2016 - Cancun, Mexico
Duration: Dec 4 2016Dec 8 2016

Other

Other23rd International Conference on Pattern Recognition, ICPR 2016
CountryMexico
CityCancun
Period12/4/1612/8/16

Fingerprint

Image classification
Target tracking
Labeling
Kinematics
Neural networks
Object detection

Keywords

  • ConvNets
  • Deep Learning
  • Fast R-CNN
  • Region CNNs in Video
  • Region Proposals

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition

Cite this

Burlina, P. (2017). MRCNN: A stateful Fast R-CNN: Using temporal consistency in R-CNN for video object localization and classification. In 2016 23rd International Conference on Pattern Recognition, ICPR 2016 (pp. 3518-3523). [7900179] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICPR.2016.7900179

MRCNN : A stateful Fast R-CNN: Using temporal consistency in R-CNN for video object localization and classification. / Burlina, Philippe.

2016 23rd International Conference on Pattern Recognition, ICPR 2016. Institute of Electrical and Electronics Engineers Inc., 2017. p. 3518-3523 7900179.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Burlina, P 2017, MRCNN: A stateful Fast R-CNN: Using temporal consistency in R-CNN for video object localization and classification. in 2016 23rd International Conference on Pattern Recognition, ICPR 2016., 7900179, Institute of Electrical and Electronics Engineers Inc., pp. 3518-3523, 23rd International Conference on Pattern Recognition, ICPR 2016, Cancun, Mexico, 12/4/16. https://doi.org/10.1109/ICPR.2016.7900179
Burlina P. MRCNN: A stateful Fast R-CNN: Using temporal consistency in R-CNN for video object localization and classification. In 2016 23rd International Conference on Pattern Recognition, ICPR 2016. Institute of Electrical and Electronics Engineers Inc. 2017. p. 3518-3523. 7900179 https://doi.org/10.1109/ICPR.2016.7900179
Burlina, Philippe. / MRCNN : A stateful Fast R-CNN: Using temporal consistency in R-CNN for video object localization and classification. 2016 23rd International Conference on Pattern Recognition, ICPR 2016. Institute of Electrical and Electronics Engineers Inc., 2017. pp. 3518-3523
@inproceedings{b4745bc72888407094a902d8734cdfff,
title = "MRCNN: A stateful Fast R-CNN: Using temporal consistency in R-CNN for video object localization and classification",
abstract = "Deep convolutional neural networks (DCNNs) perform on par or better than humans for image classification. Hence efforts have now shifted to more challenging tasks such as object detection and classification in images, video or RGBD. Recently developed region CNNs (R-CNN) such as Fast R-CNN [7] address this detection task for images. Instead, this paper is concerned with video and also focuses on resource-limited systems. Newly proposed methods accelerate R-CNN by sharing convolutional layers for proposal generation, location regression and labeling [12][13][19][25]. These approaches when applied to video are stateless: they process each image individually. This suggests an alternate route: to make R-CNN stateful and exploit temporal consistency. We extend Fast R-CNNs by making it employ recursive Bayesian filtering and perform proposal propagation and reuse. We couple multi-target proposal/detection tracking (MTT) with R-CNN and do detection-to-track association. We call this approach MRCNN as short for MTT + R-CNN. In MRCNN, region proposals that are vetted via classification and regression in R-CNNs - are treated as observations in MTT and propagated using assumed kinematics. Actual proposal generation (e.g. via Selective Search) need only be performed sporadically and/or periodically and is replaced at all other times by MTT proposal predictions. Preliminary results show that MRCNNs can economize on both proposal and classification computations, and can yield up to a 10 to 30 factor decrease in number of proposals generated, about one order of magnitude proposal computation time savings and nearly one order magnitude improvement in overall computational time savings, for comparable localization and classification performance. This method can additionally be beneficial for false alarm abatement.",
keywords = "ConvNets, Deep Learning, Fast R-CNN, Region CNNs in Video, Region Proposals",
author = "Philippe Burlina",
year = "2017",
month = "4",
day = "13",
doi = "10.1109/ICPR.2016.7900179",
language = "English (US)",
pages = "3518--3523",
booktitle = "2016 23rd International Conference on Pattern Recognition, ICPR 2016",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - MRCNN

T2 - A stateful Fast R-CNN: Using temporal consistency in R-CNN for video object localization and classification

AU - Burlina, Philippe

PY - 2017/4/13

Y1 - 2017/4/13

N2 - Deep convolutional neural networks (DCNNs) perform on par or better than humans for image classification. Hence efforts have now shifted to more challenging tasks such as object detection and classification in images, video or RGBD. Recently developed region CNNs (R-CNN) such as Fast R-CNN [7] address this detection task for images. Instead, this paper is concerned with video and also focuses on resource-limited systems. Newly proposed methods accelerate R-CNN by sharing convolutional layers for proposal generation, location regression and labeling [12][13][19][25]. These approaches when applied to video are stateless: they process each image individually. This suggests an alternate route: to make R-CNN stateful and exploit temporal consistency. We extend Fast R-CNNs by making it employ recursive Bayesian filtering and perform proposal propagation and reuse. We couple multi-target proposal/detection tracking (MTT) with R-CNN and do detection-to-track association. We call this approach MRCNN as short for MTT + R-CNN. In MRCNN, region proposals that are vetted via classification and regression in R-CNNs - are treated as observations in MTT and propagated using assumed kinematics. Actual proposal generation (e.g. via Selective Search) need only be performed sporadically and/or periodically and is replaced at all other times by MTT proposal predictions. Preliminary results show that MRCNNs can economize on both proposal and classification computations, and can yield up to a 10 to 30 factor decrease in number of proposals generated, about one order of magnitude proposal computation time savings and nearly one order magnitude improvement in overall computational time savings, for comparable localization and classification performance. This method can additionally be beneficial for false alarm abatement.

AB - Deep convolutional neural networks (DCNNs) perform on par or better than humans for image classification. Hence efforts have now shifted to more challenging tasks such as object detection and classification in images, video or RGBD. Recently developed region CNNs (R-CNN) such as Fast R-CNN [7] address this detection task for images. Instead, this paper is concerned with video and also focuses on resource-limited systems. Newly proposed methods accelerate R-CNN by sharing convolutional layers for proposal generation, location regression and labeling [12][13][19][25]. These approaches when applied to video are stateless: they process each image individually. This suggests an alternate route: to make R-CNN stateful and exploit temporal consistency. We extend Fast R-CNNs by making it employ recursive Bayesian filtering and perform proposal propagation and reuse. We couple multi-target proposal/detection tracking (MTT) with R-CNN and do detection-to-track association. We call this approach MRCNN as short for MTT + R-CNN. In MRCNN, region proposals that are vetted via classification and regression in R-CNNs - are treated as observations in MTT and propagated using assumed kinematics. Actual proposal generation (e.g. via Selective Search) need only be performed sporadically and/or periodically and is replaced at all other times by MTT proposal predictions. Preliminary results show that MRCNNs can economize on both proposal and classification computations, and can yield up to a 10 to 30 factor decrease in number of proposals generated, about one order of magnitude proposal computation time savings and nearly one order magnitude improvement in overall computational time savings, for comparable localization and classification performance. This method can additionally be beneficial for false alarm abatement.

KW - ConvNets

KW - Deep Learning

KW - Fast R-CNN

KW - Region CNNs in Video

KW - Region Proposals

UR - http://www.scopus.com/inward/record.url?scp=85019061362&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85019061362&partnerID=8YFLogxK

U2 - 10.1109/ICPR.2016.7900179

DO - 10.1109/ICPR.2016.7900179

M3 - Conference contribution

AN - SCOPUS:85019061362

SP - 3518

EP - 3523

BT - 2016 23rd International Conference on Pattern Recognition, ICPR 2016

PB - Institute of Electrical and Electronics Engineers Inc.

ER -