Datasets

Relevant information on datasets and similar resources

BMC Bioinformatics: Multiple Non-Collinear TF-map Alignments of Promoter Regions

Datasets and results of human-mouse-chicken-zebrafish orthologous gene regions that were used to train and optimize the parameters of the multiple TF-map alignment. Characterized real promoters and enhancers, artificial non-collinear examples.

U12DB: a database of orthologous U12-type spliceosomal introns

Database of clusters of orthologous U12 introns from 18 animal, 1 plant and 1 fungal species.

PLoS Computational Biology (2006): Transcription Factor Map Alignment of Promoter Regions

Dataset of the 40 human-mouse gene promoter pairs that was used to optimize the parameters of the TF-map alignment. Dataset of different genomic orthologous regions for these genes. Dataset and results of the TF-map alignment on the 5333 CISRED human co-expressed genes.

Genome Biology (2006): EGASP: The human ENCODE GENOME ANNOTATION ASSESSMENT PROJECT

Different evaluation programs were used to compare the accuracy of the gene predictions submitted to the GENCODE EGASP'05 workshop, held at the Sanger Center on May 6-7, 2005. The results from those evaluations are provided here, along with some discussion on the different methods to calculate the accuracies of each different approach at three levels of the gene structure (basically at nucleotide, exon, transcript/gene levels).

Nucleic Acids Research (2005): Comparative gene finding in chicken indicates that we are closing in on the set of multi-exonic widely expressed human genes

Datasets of 311 putative novel human genes found using the comparative gene predictor SGP2 and the chicken genome sequence, the subset of 50 most promising predictions tested by RT-PCR and the GenBank accessions of the six RT-PCR positives.

Genome Research (2005): Comparison of Splice Sites in Mammals and Chicken

Datasets for the comparative analysis of splice site sequences on a large collection of human, mouse, rat and chicken introns. The analyses performed on those datasets were focussing on the conservation of orthologous splice sites, the evolution of the U2/U12 major intron classes and the subtype switching within those classes.

Bioinformatics (2004): Splice site identification by idlBNs

Datasets of human splice sites from RefSeq-hg15 (ACCDON), internal exons from the Burset and Guigó and Rogic et al. human gene sets (BGROIEXONS) and splice, start and stop sites from RefSeq-hg16 not present in the Burset and Guigó and Rogic et al. human gene sets (NOBGRORS).

Science (2003): Selenoprotein gene prediction in Human

All the programs and data used to identify selenoproteins in the human genome. Seven novel selenoprotein genes were found by SECIS and gene prediction, together with comparative genomics approaches. We believe the human selenoproteome to consist of 17 selenoprotein families (15kDa, DI, GPX, SelH, SelI, SelK, SelM, SelN, SelO, SelP, SelR, SelS, SelT, SelV, SelW, SPS2 and TR) and, in addition, two Cys-containing homologs (MsrA and SelU), which are selenoproteins in other organisms.

PNAS (2003): Comparison of human and mouse genomes followed by experimental validation

In this site we describe all the programs and data presented in Guigó et al, PNAS 2003. In that paper we estimated that near a thousand novel human genes that do not overlap known proteins can be verified experimentally. The method is based in the comparison of human and mouse genomes to enhance the resulting gene-predictions, plus a filtering step from which a sample of mouse predictions were tested by RT-PCR amplification and direct sequencing.

Genome Research (2003): Comparative Gene Prediction in Human and Mouse

Supplementary materials for the SGP2 paper are available from this section. SGP2 is a gene prediction pogram that combines ab initio gene prediction with TBLASTX searches between two genome sequences to provide both sensitive and specific gene predictions.

Genome Research (2000): Gene Prediction Programs Evaluation in Large DNA Sequences

Given the absence of experimentally verified large genomic data sets, we constructed an semi-artificial test set comprising a number of short single-gene genomic sequences with randomly generated intergenic regions in order to analize gene-prediction programs accuracy.

Genome Research (2000): geneid in Drosophila melanogaster

A set of training sequences (exons/introns) and the resulting parameters required to run geneid on Drosophila melanogaster genome.

Genomics (1996): Evaluation of gene structure prediction programs

A number of computer programs for the prediction of gene structure in DNA genomic sequences are analyzed. The programs are tested in a large set of vertebrate sequences.

EMBO reports (2001): Selenoprotein gene prediction in the Fly

In this site we describe all the programs and data used to predict selenoproteins in the Drosophila melanogaster genome. Two novel selenoprotein families (SelK and SelH, previously named SelG and SelM) were found by coordination of gene and SECIS prediction. In addition, the fly genome is know to contain the SPS2 selenoprotein.

Syndicate content