U12DB: a database of orthologous U12-type spliceosomal introns

U12DB: a database of orthologous U12-type spliceosomal intronsU12DB: a database of orthologous U12-type spliceosomal introns

U12DB: The U12 Intron Database




ABSTRACT: U12-type introns are spliced by the U12-dependent spliceosome and are present in the genomes of many higher eukaryotic lineages including plants, chordates and some invertebrates. Investigations into the evolution and mechanism of U12-depending splicing would be facilitated by access to a catalog of such introns. However, due to their relatively recent discovery and a systematic bias against recognition of non-canonical splice sites in general, the introns defined by U12-type splice sites are under-represented in genome annotations. Such under-representation compounds the already difficult problem of determining gene structures. It also impedes attempts to study these introns genome-wide or phylum-wide. The resource described here, the U12 Intron Database (U12DB), aims to catalog the U12 introns of completely sequenced eukaryotic genomes and associate orthologous introns with each other.

The U12-dependent spliceosome. Two pathways for the removal of eukaryotic spliceosomal introns exist: a major pathway that is dependent on the main U2 snRNA-containing spliceosome and a minor pathway that is dependent on the low abundance U12 snRNA-containing spliceosome. The two spliceosomes share only one snRNA, U5, but have many of the same protein components in common. They are distinguished mainly by the splice signal sequences in the pre-mRNA to which they bind. U12 consensus sequences for the donor site, RTATCCTTT, and branch point, TTCCTTRAY, are highly conserved and distinct from the U2 consensi. The two spliceosomes also differ in the order of spliceosomal assembly. U11 and U12 form a dimer which then recognizes the donor site and branch point simultaneously, whereas U1 and U2 recognize these sites independently before associating.

Genome-wide scans. Computational scans for U12 introns have previously been performed for human (Levine and Durbin, 2001) and Arabidopsis (Zhu and Brendel, 2003). Both scans used similar methodology, essentially predicting introns and confirming them using alignment to expressed sequence. We extended this approach to 20 genomes using spliced alignment of sequence flanking known introns or transcript-confirmed intron predictions to the genomic sequence of orthologous genes. Details can be found in forthcoming article in the Nucleic Acids Research database issue.

U12DB Web Query

Search the U12 Intron Database now!


Query text

You may choose to search by U12 database intron id or intron cluster id or by Ensembl gene ids, names, or descriptions using the fields:

  • Intron ID
  • Intron Cluster ID
  • Ensembl Gene ID
  • Gene Name
  • Gene Description

Note on pattern matching: One may specify whether to return exact matches to the query or to return records that either begin with or contain the query text using the drop down list. Queries are not case sensitive.

If nothing is entered or just the single wildcard character ( * ) is typed in the text field, the corresponding database field will not be queried and results will be limited only by the other criteria given. However, the wildcard ( * ) can also be used to match any string, and the question mark ( ? ) matches any single character in conjuction with the begins with or contains options. The SQL wildcards ('%' and '_') are also recognized and may or may not cause unexpected results with certain queries such as NP_*, which would not only retrieve the expected RefSeq entries starting with NP, but would also retrieve introns in the genes TNPO1, TNPO2, Rnpc3, etc.

Other options

Results can be limited by species and/or intron type or subtype.

If the intron cluster radio button is selected, all members of the orthologous cluster of introns to which an intron matching the original query criteria belongs will be returned.

Lost introns are not really introns per se; they are records that indicate that an ungapped (intronless) alignment to the orthologous gene was obtained by exonerate. These introns may be displayed by checking the lost checkbox.

Results may also be limited to introns for which there is evidence of involvement in an alternative splicing event by checking the altsplice checkbox. The following criteria were applied to determine whether an intron is involved in an alternative event:

  • the intron must be present in the AltSplice Human Release 2 intron database
  • it must be present in an alternatively spliced gene
  • it must be present in at least one splice form
  • it must be modified (different junctions) or absent in at least one other transcript which spans the coordinates of the original intron

One may also select whether or not an intron belongs to an orthologous intron cluster with evidence for a type-switch or a subtype-switch.


GFF or FASTA text output for introns may be obtained by checking the corresponding checkbox and splice signal sequence may be obtained in table format.


U12DB MySQL Dump

The U12DB distribution consists of a single MySQL dump, which has all of the SQL commands necessary to create and populate the tables of the database.To install, create a database in mysql called u12db and then source the dump file.
All of the files can be obtained from our ftp server:

If you encounter problems...

If you encounter problems using the U12DB, or have suggestions on how to improve it, please send an e-mail to talioto@imim.es

Authors and Acknowledgements

The U12DB was developed and is actively maintained by Tyler Alioto.