Gene Prediction based on Comparative Genomics
Recently, the importance of sequence comparisons between genomes of different species to locate functional domains conserved through evolution (protein coding among them) has been underscored, and new bioinformatics methodologies have been developed to infer protein coding genes from sequence comparisons of the genomes of two different species developed (Batzoglou et al., 2000; Bafna and Hudson, 2000; Wiehe et al., 2001; Korf et al., 2001, Novichkov et al., 2001), which appear to lead to highly accurate predictions. The rationale is that functional regions (protein-coding among them) are more conserved than non-functional ones across the DNA sequence of genomes from different species (see figure below). We are developing a method to predict genes in the human genome which combines information from sequence signals potentially involved in gene specification (splice sites and start codons, essentially) and from protein-coding induced bias in the nucleotide composition of the DNA sequence, with information from sequence similarity to the mouse genome. Unlike methods previously described, this method does not require fully assembled genomic mouse syntenic regions, and it can be used with fragmentary mouse data at any level of coverage. A preliminary version of this program is being used by the Mouse Genome Sequencing Consortium.
| Relevant publications |
- R. Guigó, E.T. Dermitzakis, P. Agarwal, C.P. Ponting, G. Parra, A. Reymond, J.F. Abril, E. Keibler, R. Lyle, C. Ucla, S.E. Antonarakis and M.R. Brent. "Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes." PNAS 100(3):1140-1145 (2003) [ Abstract ] [Datasets]
- G. Parra, P. Agarwal, J.F. Abril, T. Wiehe, J.W. Fickett and R. Guigó. "Comparative gene prediction in human and mouse." Genome Research 13(1):108-117 (2003) [Abstract] [Datasets]
- Mouse Genome Sequencing Consortium (including J.F. Abril, G. Parra and R. Guigó). "Initial sequencing and comparative analysis of the mouse genome." Nature 420(6915):520-562 (2002) [Abstract]
- M.J. Betts, R. Guigó, P. Agarwal and R.B. Russell. "Exon structure conservation despite low sequence similarity: a relic of dramatic events in evolution?" EMBO Journal 20(19):5354-5360 (2001) [Abstract]
- T. Wiehe, S. Gebauer-Jung, T. Mitchell-Olds and R. Guigó. "SGP-1: Prediction and Validation of Homologous Genes Based on Sequence Alignments." Genome Research 11(9):1574-1583 (2001) [Abstract]
- T. Wiehe, R. Guigó, and W. Miller. "Genome Sequence Comparisons: Hurdles in the Fast Lane to Functional Genomics." Briefings in Bioinformatics 1(4):381-388 (2000) [PubMed Abstract]
- Type of page:






