Deep learning-based fine-grained car make/model classification for visual surveillance

Erhan Gundogdu; Enes Sinan Parlldl; Berkan Solmaz; Veysel Yücesoy; Aykut Koç

doi:10.1117/12.2278862

Deep learning-based fine-grained car make/model classification for visual surveillance

Erhan Gundogdu, Enes Sinan Parlldl, Berkan Solmaz, Veysel Yücesoy, Aykut Koç

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

2 Scopus citations

Abstract

Fine-grained object recognition is a potential computer vision problem that has been recently addressed by utilizing deep Convolutional Neural Networks (CNNs). Nevertheless, the main disadvantage of classification methods relying on deep CNN models is the need for considerably large amount of data. In addition, there exists relatively less amount of annotated data for a real world application, such as the recognition of car models in a traffic surveillance system. To this end, we mainly concentrate on the classification of fine-grained car make and/or models for visual scenarios by the help of two different domains. First, a large-scale dataset including approximately 900K images is constructed from a website which includes fine-grained car models. According to their labels, a state-of-The-Art CNN model is trained on the constructed dataset. The second domain that is dealt with is the set of images collected from a camera integrated to a traffic surveillance system. These images, which are over 260K, are gathered by a special license plate detection method on top of a motion detection algorithm. An appropriately selected size of the image is cropped from the region of interest provided by the detected license plate location. These sets of images and their provided labels for more than 30 classes are employed to fine-Tune the CNN model which is already trained on the large scale dataset described above. To fine-Tune the network, the last two fully-connected layers are randomly initialized and the remaining layers are fine-Tuned in the second dataset. In this work, the transfer of a learned model on a large dataset to a smaller one has been successfully performed by utilizing both the limited annotated data of the traffic field and a large scale dataset with available annotations. Our experimental results both in the validation dataset and the real field show that the proposed methodology performs favorably against the training of the CNN model from scratch.

Original language	English (US)
Title of host publication	Counterterrorism, Crime Fighting, Forensics, and Surveillance Technologies
Editors	Yitzhak Yitzhaky, Robert James Stokes, Henri Bouma, Felicity Carlysle-Davies
Publisher	SPIE
ISBN (Electronic)	9781510613461
DOIs	https://doi.org/10.1117/12.2278862
State	Published - 2017
Externally published	Yes
Event	Counterterrorism, Crime Fighting, Forensics, and Surveillance Technologies 2017 - Warsaw, Poland Duration: Sep 11 2017 → Sep 12 2017

Publication series

Name	Proceedings of SPIE - The International Society for Optical Engineering
Volume	10441
ISSN (Print)	0277-786X
ISSN (Electronic)	1996-756X

Conference

Conference	Counterterrorism, Crime Fighting, Forensics, and Surveillance Technologies 2017
Country/Territory	Poland
City	Warsaw
Period	9/11/17 → 9/12/17

Keywords

Deep convolutional neural networks
Fine-Tuning
Fine-grained object recognition
traffic surveillance

ASJC Scopus subject areas

Electronic, Optical and Magnetic Materials
Condensed Matter Physics
Computer Science Applications
Applied Mathematics
Electrical and Electronic Engineering

Access to Document

10.1117/12.2278862

Cite this

Gundogdu, E., Parlldl, E. S., Solmaz, B., Yücesoy, V., & Koç, A. (2017). Deep learning-based fine-grained car make/model classification for visual surveillance. In Y. Yitzhaky, R. J. Stokes, H. Bouma, & F. Carlysle-Davies (Eds.), Counterterrorism, Crime Fighting, Forensics, and Surveillance Technologies Article 104410J (Proceedings of SPIE - The International Society for Optical Engineering; Vol. 10441). SPIE. https://doi.org/10.1117/12.2278862

Deep learning-based fine-grained car make/model classification for visual surveillance. / Gundogdu, Erhan; Parlldl, Enes Sinan; Solmaz, Berkan et al.
Counterterrorism, Crime Fighting, Forensics, and Surveillance Technologies. ed. / Yitzhak Yitzhaky; Robert James Stokes; Henri Bouma; Felicity Carlysle-Davies. SPIE, 2017. 104410J (Proceedings of SPIE - The International Society for Optical Engineering; Vol. 10441).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Gundogdu, E, Parlldl, ES, Solmaz, B, Yücesoy, V & Koç, A 2017, Deep learning-based fine-grained car make/model classification for visual surveillance. in Y Yitzhaky, RJ Stokes, H Bouma & F Carlysle-Davies (eds), Counterterrorism, Crime Fighting, Forensics, and Surveillance Technologies., 104410J, Proceedings of SPIE - The International Society for Optical Engineering, vol. 10441, SPIE, Counterterrorism, Crime Fighting, Forensics, and Surveillance Technologies 2017, Warsaw, Poland, 9/11/17. https://doi.org/10.1117/12.2278862

Gundogdu E, Parlldl ES, Solmaz B, Yücesoy V, Koç A. Deep learning-based fine-grained car make/model classification for visual surveillance. In Yitzhaky Y, Stokes RJ, Bouma H, Carlysle-Davies F, editors, Counterterrorism, Crime Fighting, Forensics, and Surveillance Technologies. SPIE. 2017. 104410J. (Proceedings of SPIE - The International Society for Optical Engineering). doi: 10.1117/12.2278862

Gundogdu, Erhan ; Parlldl, Enes Sinan ; Solmaz, Berkan et al. / Deep learning-based fine-grained car make/model classification for visual surveillance. Counterterrorism, Crime Fighting, Forensics, and Surveillance Technologies. editor / Yitzhak Yitzhaky ; Robert James Stokes ; Henri Bouma ; Felicity Carlysle-Davies. SPIE, 2017. (Proceedings of SPIE - The International Society for Optical Engineering).

@inproceedings{fe7208d7f61c46258ac161ad6f439a72,

title = "Deep learning-based fine-grained car make/model classification for visual surveillance",

abstract = "Fine-grained object recognition is a potential computer vision problem that has been recently addressed by utilizing deep Convolutional Neural Networks (CNNs). Nevertheless, the main disadvantage of classification methods relying on deep CNN models is the need for considerably large amount of data. In addition, there exists relatively less amount of annotated data for a real world application, such as the recognition of car models in a traffic surveillance system. To this end, we mainly concentrate on the classification of fine-grained car make and/or models for visual scenarios by the help of two different domains. First, a large-scale dataset including approximately 900K images is constructed from a website which includes fine-grained car models. According to their labels, a state-of-The-Art CNN model is trained on the constructed dataset. The second domain that is dealt with is the set of images collected from a camera integrated to a traffic surveillance system. These images, which are over 260K, are gathered by a special license plate detection method on top of a motion detection algorithm. An appropriately selected size of the image is cropped from the region of interest provided by the detected license plate location. These sets of images and their provided labels for more than 30 classes are employed to fine-Tune the CNN model which is already trained on the large scale dataset described above. To fine-Tune the network, the last two fully-connected layers are randomly initialized and the remaining layers are fine-Tuned in the second dataset. In this work, the transfer of a learned model on a large dataset to a smaller one has been successfully performed by utilizing both the limited annotated data of the traffic field and a large scale dataset with available annotations. Our experimental results both in the validation dataset and the real field show that the proposed methodology performs favorably against the training of the CNN model from scratch.",

keywords = "Deep convolutional neural networks, Fine-Tuning, Fine-grained object recognition, traffic surveillance",

author = "Erhan Gundogdu and Parlldl, {Enes Sinan} and Berkan Solmaz and Veysel Y{\"u}cesoy and Aykut Ko{\c c}",

note = "Publisher Copyright: {\textcopyright} COPYRIGHT SPIE. Downloading of the abstract is permitted for personal use only.; Counterterrorism, Crime Fighting, Forensics, and Surveillance Technologies 2017 ; Conference date: 11-09-2017 Through 12-09-2017",

year = "2017",

doi = "10.1117/12.2278862",

language = "English (US)",

series = "Proceedings of SPIE - The International Society for Optical Engineering",

publisher = "SPIE",

editor = "Yitzhak Yitzhaky and Stokes, {Robert James} and Henri Bouma and Felicity Carlysle-Davies",

booktitle = "Counterterrorism, Crime Fighting, Forensics, and Surveillance Technologies",

}

TY - GEN

T1 - Deep learning-based fine-grained car make/model classification for visual surveillance

AU - Gundogdu, Erhan

AU - Parlldl, Enes Sinan

AU - Solmaz, Berkan

AU - Yücesoy, Veysel

AU - Koç, Aykut

N1 - Publisher Copyright: © COPYRIGHT SPIE. Downloading of the abstract is permitted for personal use only.

PY - 2017

Y1 - 2017

N2 - Fine-grained object recognition is a potential computer vision problem that has been recently addressed by utilizing deep Convolutional Neural Networks (CNNs). Nevertheless, the main disadvantage of classification methods relying on deep CNN models is the need for considerably large amount of data. In addition, there exists relatively less amount of annotated data for a real world application, such as the recognition of car models in a traffic surveillance system. To this end, we mainly concentrate on the classification of fine-grained car make and/or models for visual scenarios by the help of two different domains. First, a large-scale dataset including approximately 900K images is constructed from a website which includes fine-grained car models. According to their labels, a state-of-The-Art CNN model is trained on the constructed dataset. The second domain that is dealt with is the set of images collected from a camera integrated to a traffic surveillance system. These images, which are over 260K, are gathered by a special license plate detection method on top of a motion detection algorithm. An appropriately selected size of the image is cropped from the region of interest provided by the detected license plate location. These sets of images and their provided labels for more than 30 classes are employed to fine-Tune the CNN model which is already trained on the large scale dataset described above. To fine-Tune the network, the last two fully-connected layers are randomly initialized and the remaining layers are fine-Tuned in the second dataset. In this work, the transfer of a learned model on a large dataset to a smaller one has been successfully performed by utilizing both the limited annotated data of the traffic field and a large scale dataset with available annotations. Our experimental results both in the validation dataset and the real field show that the proposed methodology performs favorably against the training of the CNN model from scratch.

AB - Fine-grained object recognition is a potential computer vision problem that has been recently addressed by utilizing deep Convolutional Neural Networks (CNNs). Nevertheless, the main disadvantage of classification methods relying on deep CNN models is the need for considerably large amount of data. In addition, there exists relatively less amount of annotated data for a real world application, such as the recognition of car models in a traffic surveillance system. To this end, we mainly concentrate on the classification of fine-grained car make and/or models for visual scenarios by the help of two different domains. First, a large-scale dataset including approximately 900K images is constructed from a website which includes fine-grained car models. According to their labels, a state-of-The-Art CNN model is trained on the constructed dataset. The second domain that is dealt with is the set of images collected from a camera integrated to a traffic surveillance system. These images, which are over 260K, are gathered by a special license plate detection method on top of a motion detection algorithm. An appropriately selected size of the image is cropped from the region of interest provided by the detected license plate location. These sets of images and their provided labels for more than 30 classes are employed to fine-Tune the CNN model which is already trained on the large scale dataset described above. To fine-Tune the network, the last two fully-connected layers are randomly initialized and the remaining layers are fine-Tuned in the second dataset. In this work, the transfer of a learned model on a large dataset to a smaller one has been successfully performed by utilizing both the limited annotated data of the traffic field and a large scale dataset with available annotations. Our experimental results both in the validation dataset and the real field show that the proposed methodology performs favorably against the training of the CNN model from scratch.

KW - Deep convolutional neural networks

KW - Fine-Tuning

KW - Fine-grained object recognition

KW - traffic surveillance

UR - http://www.scopus.com/inward/record.url?scp=85038426240&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85038426240&partnerID=8YFLogxK

U2 - 10.1117/12.2278862

DO - 10.1117/12.2278862

M3 - Conference contribution

AN - SCOPUS:85038426240

T3 - Proceedings of SPIE - The International Society for Optical Engineering

BT - Counterterrorism, Crime Fighting, Forensics, and Surveillance Technologies

A2 - Yitzhaky, Yitzhak

A2 - Stokes, Robert James

A2 - Bouma, Henri

A2 - Carlysle-Davies, Felicity

PB - SPIE

T2 - Counterterrorism, Crime Fighting, Forensics, and Surveillance Technologies 2017

Y2 - 11 September 2017 through 12 September 2017

ER -

Deep learning-based fine-grained car make/model classification for visual surveillance

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this