PRG Research Report RR-02-01

Programming Research Group Research Report RR-02-01

Improving the sensitivity of multiple-sequence alignments by incorporating prior knowledge

January 2002, 20pp.

Abstract

In this paper, we present efficient modifications to the well-established progressive alignment algorithm for biological sequences. These modifications are designed to allow the user to incorporate prior knowledge about the sequences and so greatly improve the sensitivity of the resulting alignments. The first modification increases the probability that certain biologically distinguishable structures are preserved during the alignment process. The second modification increases the probability that specified sequence segments will align with each other.

We have implemented both of these modifications in an interactive multiple-sequence alignment tool (IMSA). IMSA takes a two-stage approach to the alignment process. The initial or pre-processing stage takes as input sets of sequence segments defined on DNA, RNA or protein sequences. These sets of sequences represent biologically distinguishable features, which could be derived from known homologies, or known structural or functional elements. The sequences to be aligned are efficiently annotated based on this additional information, and the program then computes an alignment which is adjusted to take account of this annotation.

This paper is available as a 416668 bytes gzipped PostScript file.