In this paper we provide an overview of audiovisual saliency map models. In the simplest model, the location of auditory source is modeled as a Gaussian and use different methods of combining the auditory and visual information. We then provide experimental results with applications of simple audio-visual integration models for cognitive scene analysis. We validate the simple audio-visual saliency models with a hardware convolutional network architecture and real data recorded from moving audio-visual objects. The latter system was developed under Torch language by extending the attention.lua (code) and attention.ui (GUI) files that implement Culurciello's visual attention model.