Asymptotic Properties of Sequence Alignments
Sequence alignment is an important technique in several application areas in which the similarity of two or more strings (or signals) have to be compared (genomics, speech recognition, language processing etc.). In order to understand the significance of similarity scores under alignment methods, one would like to understand the distribution of scores of random sequences under the same alignment methods. We are investigating several properties of these distributions using a boosting technique that allows to infer information about longer random sequences from distributional information about shorter sequence. Typical techniques that are involved are large deviations theory, convex analysis, nonlinear optimization, and Monte Carlo simulation.