Selenoprofiles2

Selenoproteins are a group of proteins that contain selenocysteine (Sec), a rare amino acid inserted co-translationally into the protein chain. The Sec codon is UGA, which is normally a stop codon. In selenoproteins UGA is recoded to Sec in presence of specific signals on selenoprotein gene transcripts. Due to the dual role of the UGA codon, gene prediction programs fail to predict correctly selenoproteins. Selenoprofiles is an homology-based in silico tool able to scan genomes for members of the known selenoprotein families, thus finding both selenoproteins and cysteine homologues. Selenoprofiles is built in python, and it internally runs psitblastn, exonerate, genewise and SECISearch.

Selenoprofiles is tuned to search for selenoprotein genes, and comes out-of-the box with profile alignments for each known selenoprotein and selenocysteine-related family (Note: profiles will be released soon. The current release contain only the program and a single profile for example).

Selenoprofiles can be used to search for any protein family (also non-selenoprotein), given an input profile alignment. This pipeline combines standard gene prediction tools to provide a clean and fast way to scan genomes for protein families, and provides a wide repertoire of output formats which can also be extended by the user. The program allows for a deep level of customization, and provides many built-in methods to filter spurious hits.

NOTE: This page describes selenoprofiles version 2.2. A newer version of this program is available here. Version 1 is no longer maintained.

 This version features major improvements on the previous ones, such as:

  • improved workflow control
  • prediction by blast can be output, allowing use of selenoprofiles in bacterial genomes (exonerate and genewise are eukaryote specific)
  • lazy computing implemented
  • pre-clustering of the profile alignment: multiple blast are run if the profile is highly variable
  • an SQLite database is used to store results, allowing to search for a high number of families without producing an enourmous amount of files, since they can be deleted at the end of computation
  • improved customization of the options used with the slave programs, which can potentially be different for each profile
  • improved filtering of results: all filtering procedures are defined as pieces of python code which are run internally in selenoprofiles. Several methods useful for filtering are provided. Filtering can be customized for each family
  • intra-family and inter-family redundancy of results is removed
  • tag blast and gene ontology extensions implemented for filtering (see manual)

Tools for graphical representation of selenoprofiles results are under development and will be released in the next few months.

MANUAL

Download the last version of selenoprofiles manual here: http://genome.crg.es/~mmariotti/selenoprofiles_manual.2.2.pdf 

INSTALLATION:

For selenoprofiles to work, all the slave programs that it utilizes must be  already installed in your machine (blastall, exonerate, genewise). You will also need some external python modules if you want to use all its functionalities. These additional modules are needed if you want to scan genomes for selenoproteins, but may be omitted if you want to scan for your protein family of interest. In this page you can find help to install the slave programs and the additional python modules.

To install selenoprofiles, download this tarball. Then, follow the instructions in the README.

CITATION:

Selenoprofiles was published in Bioinformatics. To read the article or access the online data, check this page. Please cite:

Mariotti M, Guigo R - Selenoprofiles: profile-based scanning of eukaryotic genome sequences for selenoprotein genes.Bioinformatics. 2010 Nov 1;26(21):2656-63. Epub 2010 Sep 21

 

Contact: marco.mariotti@crg.eu