Motif discovery and motif finding from genome-mapped DNase footprint data

Ivan V. Kulakovskiy, Alexander V. Favorov, Vsevolod J. Makeev

Research output: Contribution to journalArticlepeer-review

Abstract

Motivation: Footprint data is an important source of information on transcription factor recognition motifs. However, a footprinting fragment can contain no sequences similar to known protein recognition sites. Inspection of genome fragments nearby can help to identify missing site positions. Results: Genome fragments containing footprints were supplied to a pipeline that constructed a position weight matrix (PWM) for different motif lengths and selected the optimal PWM. Fragments were aligned with the SeSiMCMC sampler and a new heuristic algorithm, Bigfoot. Footprints with missing hits were found for ∼50% of factors. Adding only 2 bp on both sides of a footprinting fragment recovered most hits. We automatically constructed motifs for 41 Drosophila factors. New motifs can recognize footprints with a greater sensitivity at the same false positive rate than existing models. Also we discuss possible overfitting of constructed motifs.

Original languageEnglish (US)
Pages (from-to)2318-2325
Number of pages8
JournalBioinformatics
Volume25
Issue number18
DOIs
StatePublished - Sep 2009

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Fingerprint

Dive into the research topics of 'Motif discovery and motif finding from genome-mapped DNase footprint data'. Together they form a unique fingerprint.

Cite this