Genotyping in the cloud with crossbow

James Gurtowski, Michael C. Schatz, Ben Langmead

Research output: Contribution to journalArticlepeer-review

28 Scopus citations

Abstract

Crossbow is a scalable, portable, and automatic cloud computing tool for identifying SNPs from high-coverage, short-read resequencing data. It is built on Apache Hadoop, an implementation of the MapReduce software framework. Hadoop allows Crossbow to distribute read alignment and SNP calling subtasks over a cluster of commodity computers. Two robust tools, Bowtie and SOAPsnp, implement the fundamental alignment and variant calling operations respectively, and have demonstrated capabilities within Crossbow of analyzing approximately one billion short reads per hour on a commodity Hadoop cluster with 320 cores. Through protocol examples, this unit will demonstrate the use of Crossbow for identifying variations in three different operating modes: on a Hadoop cluster, on a single computer, and on the Amazon Elastic MapReduce cloud computing service.

Original languageEnglish (US)
Article number15.3
JournalCurrent Protocols in Bioinformatics
Issue numberSUPPL.39
DOIs
StatePublished - Sep 2012
Externally publishedYes

Keywords

  • Cloud computing
  • Hadoop
  • Read alignment
  • SNP calling
  • Short reads
  • Software package

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry

Fingerprint

Dive into the research topics of 'Genotyping in the cloud with crossbow'. Together they form a unique fingerprint.

Cite this