Analyzing and exploiting NARX recurrent neural networks for long-term dependencies

Robert DiPietro, Christian Rupprecht, Nassir Navab, Gregory D. Hager

Research output: Contribution to conferencePaper

Abstract

Recurrent neural networks (RNNs) have achieved state-of-the-art performance on many diverse tasks, from machine translation to surgical activity recognition, yet training RNNs to capture long-term dependencies remains difficult. To date, the vast majority of successful RNN architectures alleviate this problem using nearly-additive connections between states, as introduced by long short-term memory (LSTM). We take an orthogonal approach and introduce MIST RNNs, a NARX RNN architecture that allows direct connections from the very distant past. We show that MIST RNNs 1) exhibit superior vanishing-gradient properties in comparison to LSTM and previously-proposed NARX RNNs; 2) are far more efficient than previously-proposed NARX RNN architectures, requiring even fewer computations than LSTM; and 3) improve performance substantially over LSTM and Clockwork RNNs on tasks requiring very long-term dependencies.

Original languageEnglish (US)
StatePublished - Jan 1 2018
Event6th International Conference on Learning Representations, ICLR 2018 - Vancouver, Canada
Duration: Apr 30 2018May 3 2018

Conference

Conference6th International Conference on Learning Representations, ICLR 2018
CountryCanada
CityVancouver
Period4/30/185/3/18

ASJC Scopus subject areas

  • Education
  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Fingerprint Dive into the research topics of 'Analyzing and exploiting NARX recurrent neural networks for long-term dependencies'. Together they form a unique fingerprint.

  • Cite this

    DiPietro, R., Rupprecht, C., Navab, N., & Hager, G. D. (2018). Analyzing and exploiting NARX recurrent neural networks for long-term dependencies. Paper presented at 6th International Conference on Learning Representations, ICLR 2018, Vancouver, Canada.