Learning Representations of Endoscopic Videos to Detect Tool Presence Without Supervision

David Z. Li, Masaru Ishii, Russell H. Taylor, Gregory D. Hager, Ayushi Sinha

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this work, we explore whether it is possible to learn representations of endoscopic video frames to perform tasks such as identifying surgical tool presence without supervision. We use a maximum mean discrepancy (MMD) variational autoencoder (VAE) to learn low-dimensional latent representations of endoscopic videos and manipulate these representations to distinguish frames containing tools from those without tools. We use three different methods to manipulate these latent representations in order to predict tool presence in each frame. Our fully unsupervised methods can identify whether endoscopic video frames contain tools with average precision of 71.56, 73.93, and 76.18, respectively, comparable to supervised methods. Our code is available at https://github.com/zdavidli/tool-presence/.

Original languageEnglish (US)
Title of host publicationMultimodal Learning for Clinical Decision Support and Clinical Image-Based Procedures - 10th International Workshop, ML-CDS 2020, and 9th International Workshop, CLIP 2020, Held in Conjunction with MICCAI 2020, Proceedings
EditorsTanveer Syeda-Mahmood, Klaus Drechsler, Hayit Greenspan, Anant Madabhushi, Alexandros Karargyris, Cristina Oyarzun Laura, Stefan Wesarg, Marius George Linguraru, Raj Shekhar, Marius Erdt, Miguel Ángel González Ballester
PublisherSpringer Science and Business Media Deutschland GmbH
Pages54-63
Number of pages10
ISBN (Print)9783030609450
DOIs
StatePublished - 2020
Event10th International Workshop on Multimodal Learning for Clinical Decision Support, ML-CDS 2020, and the 9th International Workshop on Clinical Image-Based Procedures, CLIP 2020, held in conjunction with the 23rd International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2020 - Lima, Peru
Duration: Oct 4 2020Oct 8 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12445 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference10th International Workshop on Multimodal Learning for Clinical Decision Support, ML-CDS 2020, and the 9th International Workshop on Clinical Image-Based Procedures, CLIP 2020, held in conjunction with the 23rd International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2020
CountryPeru
CityLima
Period10/4/2010/8/20

Keywords

  • Endoscopic video
  • Maximum mean discrepancy
  • Representation learning
  • Tool presence
  • Variational autoencoder

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Learning Representations of Endoscopic Videos to Detect Tool Presence Without Supervision'. Together they form a unique fingerprint.

Cite this