Gene Prediction/Genome Annotation

We are continually working on the development of geneid, an ab initio gene prediction program, and SGP2 a comparative gene finder. We currently have more than fifty parameter files for gene prediction in a wide range of species spanning all Kingdoms of life.

During 2011/12 we have continued our collaboration with the Baylor College of Medicine (Houston, USA) in the annotation of several hymenoptera species (B.terrestris/B.impatiens/A.mellifera) by generating both geneid and SGP2 parameter files for these species and obtaining predictions which are being incorporated into the annotation pipelines for these organisms.

In the course of 2010-2012, and as members of the INB (“Instituto Nacional de Bioinformatica”) we also continued/concluded our participation in three large-scale genome annotation projects:

1- Annotation of the Cucumis metuliferus (melon) genome. As members of the melonomics consortium we have developed melon-specific geneid and SGP2 parameter files which were used on the final pipeline to annotate the Cucumis metuliferus (melon) genome. Members of the “Melonomics consortium” have recently published a PNAS article summarizing the work done on this first-ever Spanish sequencing/annotation project  (The genome of melon (Cucumis melo L.)).

2- Sequencing and Annotation of the Common Bean Genome: Maximization of the Latin American/Iberian natural resources (PhasIbeam consortium). On this ongoing project we have so far developed a preliminary set of geneid and SGP2 parameter files which seem to predict genes in this species with high accuracy. We are also involved in the coordination of this project and have collaborated in the creation of both a google group and a wiki in order to help organize and centralize this multi-national project.

3- Assembly, annotation and comparative analysis of Iberian and European lynx genomes. We are also continuing with our participation in the Iberian lynx (a threatened species) sequencing/annotation effort. In the past year we have generated transcriptomes for several tissues of one lynx individual. We also performed a preliminary annotation of the transcriptomes by aligning them to annotated cat proteins, focusing on Immune System Major Histocompatibility Complex (MHC) class proteins.

Another genome annotation project, we collaborated in for several years (“Annotation of the Solanum lycopersicum (tomato) genome”) was completed in late 2010. An article on the outcome of this project was published recently in the journal Nature, which includes results from our analyses and of which we are contributing authors.