Starcode is a DNA sequence clustering software. Sequence clustering is performed by finding all pairs below a Levenshtein distance metric. Typically, a file containing a set of related DNA sequences is passed as input, jointly with a parameter specifying the desired cluster distance. Starcode aligns and computes the distance between all the sequence pairs and prints a line for each cluster containing: canonical DNA sequence, sequence count and the list of sequences that belong to the cluster.
Starcode has many applications in the field of biology, such as DNA/RNA motif recovery, barcode clustering, sequencing error recovery, etc.
sQTLseekeR is a R package to detect splicing QTLs (sQTLs), which are variants associated with change in the splicing pattern of a gene. Here, splicing patterns are modeled by the relative expression of the transcripts of a gene.
sgp2 is a program to predict genes by comparing anonymous genomic sequences from different species. It combines tblastx, a sequence similarity search program, with geneid, an ab initio gene prediction program.
Selenoprofiles is a homology-based gene finding tool which is suitable for selenoprotein prediction in large nucleotide databases, like genomes. Selenoproteins are a group of proteins that contain selenocysteine (Sec), a rare amino acid inserted co-translationally into the protein chain. The Sec codon is UGA, which is normally a stop codon. In selenoproteins UGA is recoded to Sec in presence of specific signals on selenoprotein gene transcripts. Due to the dual role of the UGA codon, selenoprotein prediction and annotation are difficult tasks and are left mostly to manual analysis, since there are no reliable “golden standard” programs for this purpose. Here we present an homology-based in silico tool to scan genomes for members of the known selenoprotein families: selenoprofiles. This pipeline has features that make it suitable for selenoprotein prediction, and is shown to correctly predict selenoproteins that are badly annotated in Ensembl. Selenoprofiles is a python-built pipeline that internally runs psitblastn, exonerate, genewise and SECISearch.
In this web server we provide public access to two new computational methods for selenoprotein identification and analysis: SECISearch3 replaces its predecessor SECISearch as a tool for prediction of eukaryotic SECIS elements. Seblastian is a new method for selenoprotein gene detection that uses SECISearch3 and then predicts selenoprotein sequences encoded upstream of SECIS elements. Seblastian is able to both identify known selenoproteins and predict new selenoproteins. This project is the result of a collaboration with Vadim Gladyshev's lab in Harvard
SECISaln will predict a SECIS element in the query sequence, split it into its constituent parts and align these against a precompiled database of eukaryotic SECIS elements.
SECISaln will predict a SECIS element in the query sequence, split it into its constituent parts and align these against a precompiled database of eukaryotic SECIS elements.