Services and software
These are the different applications created and maintained by our group.
AcE
AcE is a program to aid gene prediction accuracy evaluation. It uses GFF format to make it easy to convert gene prediction results into an analyzable format. Novel features include isoform accuracy evaluation from either the annotated gene or gene prediction perspective or both at the same time. Masking of genomic sequence which has unknown features allows gene predictions in annotated regions to be analyzed in a genomic context. Test sets, such as an artificial sequence test set or genomic context test set, can be generated by selecting specified annotated sequences from a master set.
Our version is no longer maintained, from now on you can get the latest AcE version at Bioinformatics.org. You can contact with William.S.Hayes from this link.
AStalavista
AStalavista, the Alternative Splicing Trascriptional Landscape Visualization Tool and more, retrieves all alternative splicing events from generic transcript annotations.
geneid
geneid is a program to predict genes along a DNA sequence in a large set of organisms. While its accuracy compares favorably to that of other existing tools, geneid is more efficient in terms of speed and memory usage and it offers some rudimentary support to integrate predictions from multiple source.
You will also find whole genome annotations for different species obtained with geneid in our "Gene Predictions" web pages.
gff2aplot
Visualizing pair-wise alignments with annotated axes from GFF files.
We are proud to announce the new version of gff2aplot, that has been re-implemented in perl. Visit the program's web page downloading section to obtain v2.0. You can obtain the full distribution tarball from there.
Although the "gff2aplot User's Manual" is not finished yet, you can start using it as we have written several HTML tutorials that will introduce you in how to use this program. We hope you will enjoy them.
A snapshots web page is also available, listing few examples of what can be done with gff2aplot.
gff2ps
Obtaining plots to compare genomic sequences and/or sources from GFF files. Last available version is 0.98. Get the PostScript version of "gff2ps Users Manual" (v0.96). A Web Server is also available at Institut Pasteur thanks to Catherine Letondal. A new section has been created: HTML HOWTOs for gff2ps. The first two HOWTOs were also added: "Comparing sources with gff2ps" and "Visualizing PostScript output from gff2ps". We hope you will find them useful.
gff2ps was used to obtain the six chromosome arm plots (X, 2L, 2R, 3L, 3R and 4) appearing in the "Coding content of the fly genome" genome map (figure 4), included as a poster in "The Genome Sequence of Drosophila melanogaster" [Adams et al. Science 287(5461):2185-2195(2000)].
We have produced the map of the Human Genome with gff2ps. 22 autosomic, X and Y chromosomes were displayed in a big poster appearing as the figure 1 of "The Sequence of the Human Genome" [Venter et al. Science 291(5507):1304-1351 (2001)]. The single chromosome pictures can be accessed from here to visualize the web version of the "Annotation of the Celera Human Genome Assembly" poster.
gff2ps has achieved another genome landmark. The mosquito genome annotation for five chromosome arms (2L, 2R, 3L, 3R and X) has been summarized into a two-sided five-pages foldout included as the figure 1 of "The Genome Sequence of the Malaria Mosquito Anopheles gambiae" [Holt et al. Science 298(5591):129-149 (2002)], available from the "Annotation of the Anopheles gambiae genome sequence" web page.
meta
meta is a program to produce and to align the TF-maps of two gene promoter regions. meta is very useful to characterize promoter regions from orthologous genes, or from co-regulated genes in microarrays, as it reduces the signal/noise ratio in a very significant manner, still detecting the real functional sites.
mmeta
mmeta is a program to produce and to align the TF-maps of multiple promoter regions. mmeta is very powerful to characterize promoter regions from multiple orthologous genes, or from co-regulated genes in microarrays, as it reduces the signal/noise ratio in a very significant manner, still detecting the real functional sites.
overlap
overlap is a program that computes the overlap between two sets of genomic features. More precisely it takes two gff files of genomic features as input and for each feature of the first set, says whether it is overlapped by a feature of the second set (basic mode, however more and more precise information can be retrieved). Please contact Sarah Djebali (sarah dot djebali at crg dot es for any question).
PATRONUS
PATRONUS (from "PATtern Recognition by Optimized Numerical Universal Scoring") is a program designed to compute in a very fast way the exact probability of observing a given number of occurrences of a simple motif (that is, a continuous word without gaps) in a sequence. Its intended scope is the analysis of very long biological sequences, like chromosomes or whole genomes of complex organisms. The probability is computed on the basis of the Markovian statistics of order m for the sequence, that is the recorded number of the occurrences of all the submotifs of length m + 1 in the sequence. Contrary to what many people believe, computing such a probability for a generic motif is a computationally demanding task, mainly because motifs can overlap in non-trivial ways.
A detailed description of both the PATRONUS algorithm and its excellent performance can be found here.
project
project is a program that projects genomic features onto their sequences. Please contact Sarah Djebali (sarah dot djebali at crg dot es for any question).
SECISaln
SECISaln will predict a SECIS element in the query sequence, split it into its constituent parts and align these against a precompiled database of eukaryotic SECIS elements. The user can choose whether the database sequences are sorted by protein family or by species, thereby offering the possibility of comparing the submitted sequence to other, known SECISes. In addition, SECISaln returns a graphical image of the predicted structure of the user-submitted sequence as well as a multiple structural alignment of all SECIS elements of that type already present in the database.
Selenoprofiles
Selenoproteins are a group of proteins that contain selenocysteine (Sec), a rare amino acid inserted co-translationally into the protein chain. The Sec codon is UGA, which is normally a stop codon. In selenoproteins UGA is recoded to Sec in presence of specific signals on selenoprotein gene transcripts. Due to the dual role of the UGA codon, gene prediction programs fail to predict correctly selenoproteins. Selenoprofiles is an homology-based in silico tool able to scan genomes for members of the known selenoprotein families, thus finding both selenoproteins and cysteine homologues. Selenoprofiles is built in python, and it internally runs psitblastn, exonerate, genewise and SECISearch. Selenoprofiles can be used with no human intervention at all, although we recommend a manual (and expert) revision of results. Currently, Selenoprofiles is tuned to search only eukaryotic sequences, and includes profile alignments for each known eukaryotic selenoprotein and selenocysteine-related family. You can request an analysis with Selenoprofiles by sending an email to marco.mariotti at crg.es , or you can install the program yourself by following the instructions below.
The Selenoprofiles predictions on Ensembl genomes are available through a DAS server at http://genome.crg.cat:9000/das
A graphical summary of said predictions can be downloaded here: http://genome.crg.es/datasets/selenoprofiles2010/results_ensembl52.png
Contact: marco.mariotti@crg.es
INSTALLATION
The first step to install Selenoprofiles is to unzip all files in this tarball to what will be your Selenoprofiles installation directory. Then, all external programs that Selenoprofiles utilizes must be manually installed. Selenoprofiles looks for executables in the folder defined by the "-bin_folder" option, which by default is the installation directory. So after installing the following programs you have to link the executables there. For Selenoprofiles to be working, these executables must be present (program packages are indicated in parenthesis): blastall (from ncbi blast package 2.2.22) blastpgp (from ncbi blast package 2.2.22) formatdb (from ncbi blast package 2.2.22) exonerate (from exonerate version 2.2.0) fastafetch (from exonerate version 2.2.0) fastaindex (from exonerate version 2.2.0) fastasubseq (from exonerate version 2.2.0) fastalength (from exonerate version 2.2.0) genewise (from Wise2 package) For genewise, a particular procedure has to be done to make possible using a custom scoring matrix that takes into account the UGA as selenocysteine. Once you finished the genewise installation, copy the whole wisecfg directory in a new directory called wisecfg inside the selenoprofiles installation directory should be fine). Then replace two files (codon.table and BLOSUM62sel.bla) with the files found in the selenoprofiles package.
When all programs are ready to use, we proceed to prepare the selenoprofiles.config file. In the file you downloaded, you'll find a number of INSTALL_DIR occurences. If everything worked out, now you just have to substitute these occurences with your Selenoprofiles installation path and you then can launch selenoprofiles.py
MANUAL
You can download the last version of selenoprofiles manual here: http://genome.crg.es/~mmariotti/selenoprofiles_manual.txt
sgp2
sgp2 is a program to predict genes by comparing anonymous genomic sequences from different species. It combines tblastx, a sequence similarity search program, with geneid, an ab initio gene prediction program.
You will also find whole genome annotations for different species obtained with sgp2 in our "Gene Predictions" web pages.
SymCurv
ymCurv is a computational ab initio method for nucleosome positioning prediction. It is based on the structural property of natural nucleosome forming sequences, to be symmetrically curved around a local minimum of curvature. The method takes as input the primary DNA sequence, calculates the expected curvature from which it deduces possible centers of nucleosomal sequences, by imposing symmetry constraints. SymCurv's performance is comparable to existing tools but offers the additional advantages of predicting nucleosome positions under two assumed-states (stationary and dynamic) providing insight on the remodelling potential of nucleosomes of possible regulatory function.
The Flux Capacitor
The Flux Capacitor predicts abundances for transcript molecules and alternative splicing events from RNAseq experiments. Additionally, there is a simulation pipeline that is capable to simulate whole transcriptome sequencing experiments.
The GEM (GEnome Multi-tool) Library
The GEM (GEnome Multi-tool) Library is a set of very optimized tools for indexing/querying huge genomes/files. Provided so far are a very fast exhaustive mapper (the GEM mapper), an unconstrained split mapper (the GEM split mapper), and a very fast program to compute genome mappability (the GEM mappability).













