University of Oxford Logo University of OxfordDepartment of Computer Science - Home

Bioinformatics Group - Software

This page contains links to software packages for various bioinformatics tasks written by members of the Bioinformatics group and students.

Contents


IMSA

"Improving the Sensitivity of Multiple-Sequence Alignments by Incorporating Prior Knowledge",
Sumedha Gunewardena and Peter Jeavons,
Oxford University Computing Laboratory, Technical Report PRG-RR-02-01, January 2002.

Link to web page with supplementary information about this article.

IMSA is a multiple sequence alignment tool that allows users to input, as prior knowledge, sets of sequences they know to be homologous or sequences they know to have structural or functional properties. The program annotates the input sequences based on this knowledge, which is then used to perform a smart alignment of the sequences. The program tries to capture two biologically reasonable conjectures that can vastly improve the sensitivity of the alignments. The first of these ideas is based on the need to preserve certain biologically distinguishable structures during the alignment process. The second idea is based on the need to align residues of certain distinguishable segments of sequence with each other, with higher probability than otherwise specified by the substitution matrix.

The multiple sequence alignment algorithm used in IMSA is modified from a standard iterative pair-wise alignment algorithm. We use what we call 'sequence tags' to tag the input sequence. This is an efficient and robust method to tag biological sequences that was developed for this application.

IMSA is written in ANSI C. You are free to incorporate the modified alignment equations used in IMSA and the implementation of sequence tags in third party code provided you cite the above reference. You may download IMSA version 1.00 from here. Online documentation for IMSA is available here.

To help us keep track of how many people use IMSA, we would greatly appreciate hearing from you if you make any use of the program. Any comments, suggestions, or bug reports may also be mailed to the author.


PromView

by Peter Jeavons

Promview is a viewer for DNA sequences that allows the user to:

Promview is written in Java. The class files are available for download here. To run PromView you need to have installed a copy of the Java 2 Runtime Environment (version 1.3 or later) - this can be downloaded here. Once you have the Java run-time environment installed, you can run PromView with the command "java -jar PromView.jar"

This program is still under development. To help us keep track of how many people use PromView, we would greatly appreciate hearing from you if you make any use of the program. Any comments, suggestions, or bug reports may also be mailed to the author.


ConsensusMaker

by Peter Jeavons

ConsensusMaker is a simple alignment tool for sequences that allows the user to build up a consensus sequence from a collection of input sequences.

ConsensusMaker is written in Java and will run as an applet in a Java-enabled browser. Alternatively, the class files are available for download here. To run ConsensusMaker you need to have installed a copy of the Java 2 Runtime Environment (version 1.3 or later) - this can be downloaded here. Once you have the Java run-time environment installed, you can run ConsensusMaker with the command "java -jar ConsensusMaker.jar"

To help us keep track of how many people use ConsensusMaker, we would greatly appreciate hearing from you if you make any use of the program. Any comments, suggestions, or bug reports may also be mailed to the author.


TScan

"Finding Transcription Factor Binding Sites in DNA Sequences: A Template Based Approach",
Sumedha Gunewardena and Peter Jeavons,
Oxford University Computing Laboratory, Research Report PRG-RR-03-21, August 2003.

A problem faced by many algorithms for finding transcription factor binding sites is the high number of false positive hits that result with the increased sensitivity of their prediction. A main contributing factor to this is the short and degenerate nature of these sites which results in a low signal to noise ratio. In order to counter this problem one needs to look beyond the base independence assumption.

TScan is a software package written for discriminating motif patterns in genomic sequences. It was primarily written to identify transcription factor binding sites in DNA sequences. TScan is based on templates designed to capture, for discrimination, not only the vertical consensus but also the correlations present between individual bases with the other bases of the site.

A prototype version of TScan has been written in Matlab and can be downloaded from here. Online documentation for TScan is available here.

This version of TScan has been written only to evaluate the performance of our template models.

To help us keep track of how many people use TScan, we would greatly appreciate hearing from you if you make any use of the program. Any comments, suggestions, or bug reports may also be mailed to the author.


NOSA

by Francis Tsang

An optimal sequence alignment is not necessarily the biologically "correct" sequence alignment. In particular, when two sequences are evolutionarily distant, their optimal sequence alignment(s) may fail to identify the essential biological phenomena that should be captured. On the other hand, a set of sequence alignments whose scores are close to the optimum may reveal useful information that is missing in the optimal one(s). This leads to the development of algorithms that produce optimal and near-optimal sequence alignments. As the number of near-optimal sequence alignments grows exponentially, it is impractical to enumerate all of them. Near-Optimal Sequence Aligner (NOSA) is a program written in Java that produces optimal and near-optimal sequence alignments; and it allows all optimal and near-optimal sequence alignments to be shown in the same graph. The algorithm used in NOSA is modified from the one proposed by Naor and Brutlag [Naor & Brutlag 1994 (J. Comp. Bio., 1:349-366)]. We introduce three new ideas. Firstly, we quantify the significance of every residue pair in all optimal and near-optimal sequence alignments. Secondly, NOSA allows a user to specify the way that some part of a sequence must align against some part of the other sequence. This pre-aligned region(s) is kept intact during the alignment process. Finally, as the algorithm proposed by Naor and Brutlag was based on a simple scoring scheme, we extend the algorithm to cover affine gap weight model, which the score of a gap is computed as an affine function of the gap length.

The Java source code for NOSA can be downloaded from here. A set of example data files can be downloaded from here.

To help us keep track of how many people use NOSA, we would greatly appreciate hearing from you if you make any use of the program. Any comments, suggestions, or bug reports may also be mailed to the author.