Mdeling splice sites with Bayes networks

Cai Deyou, Arthur Delcher, Kao Ben, Simon Kasif

Research output: Contribution to journalArticlepeer-review

74 Scopus citations


Motivation: The main goal in this paper is to develop accurate probabilistic models for important functional regions in DNA sequences (e.g. splice junctions that signal the beginning and end of transcription in human DNA). These methods can subsequently be utilized to improve the performance of gene-finding systems. The models built here attempt to model long-distance dependencies between non-adjacent bases. Results: An efficient modeling method is described which models biological data move accurately than a first-order Markov model without increasing the number of parameters. Intuitively, a small number of parameters helps a learning system to avoid overfitting. Several experiments with the model are presented, which show a small improvement in the average accuracy as compared with a simple Markov model. These experiments suggest that single long distance dependencies do not help the recognition problem, thus confirming several previous studies which have used more heuristic modeling techniques. Availability: This software is available for download and as a web resource at Contact:

Original languageEnglish (US)
Pages (from-to)152-158
Number of pages7
Issue number2
StatePublished - Feb 2000
Externally publishedYes

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics


Dive into the research topics of 'Mdeling splice sites with Bayes networks'. Together they form a unique fingerprint.

Cite this