Nucleic Acids Research (2005): Comparative gene finding in chicken indicates that we are closing in on the set of multi-exonic widely expressed human genes

Nucleic Acids Research (2005): Comparative gene finding in chicken indicates that we are closing in on the set of multi-exonic widely expressed human genes

Comparative gene finding in chicken indicates
that we are closing in on the set of
multi-exonic widely expressed human genes



R. Castelo*, A. Reymond, C. Wyss, F. Câmara, G. Parra, S.E. Antonarakis, R. Guigó and E. Eyras


Nucleic Acids Research, 33(6):1935-1939, 2005 [full text]



*To whom correspondence should be adressed.

Contents

In this site you can find the set of 311 putative novel human genes found using the comparative gene predictor SGP2 and the chicken genome sequence. You also will find the subset of 50 most promising predictions that were tested by RT-PCR as well as the identifiers and GenBank accessions of the six positives.

 

Abstract height=15

The recent availability of the chicken genome sequence poses the question of whether there are human protein-coding genes conserved in chicken that are currently not included in the human gene catalog. Here we show, using comparative gene finding followed by experimental verification of exon pairs by RT-PCR, that the addition to the multi-exonic subset of this catalog could be as little as 0.2% suggesting that we may be closing in on the human gene set. Our protocol, however, has two shortcomings: (1) the bioinformatic screening of the predicted genes, applied to filter out false positives, cannot handle intronless genes; and (2) the experimental verification could fail to identify expression at a specific developmental time. This highlights the importance of developing methods that could provide a reliable estimate of the number of these two types of genes.

 

The data height=15

The following files contain the amino acid sequence, DNA coding sequence and genomic coordinates of the 311 putative novel human genes:
 

hg16.311.putative.aa.fa (52K)

amino acid sequences in FASTA format

hg16.311.putative.cds.fa (148K)

DNA coding sequences in FASTA format

hg16.311.putative.gff (72K)

genomic coordinates in GFF format

The file hg16.50.mostpromisingexonjunctions.tbl contains the identifiers and (tested) exon junctions from the 50 most promising genes chosen according to the criteria described in the main article. The file consists of the five columns: identifier, intron position, tested exon-exon junction position, upstream exon and downstream exon (forming the tested exon-exon junction). From the 50 exon-exon junctions tested by RT-PCR the following six were positive:

 

identifier

exon-exon junction position

GenBank accession

chr18_515

8

AY947523

chr4_55

2

AY947524

chr4_1746

2

AY947525

chr5_400

3

AY947526

chr15_51

1

AY947527

chr22_143

2

AY947528