TOOLS FOR PREDICTION AND ANALYSIS OF PROTEIN-CODING GENE STRUCTURE

A computing system SpliceView for splicing signals prediction





MOTIVATION

         Analysis of the splice sites is very important field of computational biology due to their key role in prediction of the exon- intron structure of protein-coding genes. Again, they are the most represented type of functional sites in the nucleotide sequence databases.

RESULTS

         The program is based on prediction of splice signals by classification approach (a set of consensuses). It is based on two main assumptions:
1) The high frequency of some nucleotides in definite site positions reflects a functional importance to preserve the nucleotide in this position.
2) Nucleotides of different site positions are considered to be mutually dependent, thus forming the structure which might be recognised by some particle.
For this system 9 bp sequences were taken for donor splice sites and 20 bp sequences for acceptor. No global characteristics was taken into account. For human donor signals about 5% of true signals is lost (Accuracy=95%). About 15% of GT containing sequences are predicted as true sites (Specificity - 85%). For human acceptor splice signals about 7% of true signals is lost (Accuracy=93%). About 19% of AG containing sequences are predicted as true sites (Specificity - 81%). The method of analysis is more succesfull for donor splice signals due to significant correlations in this site, which had been revealed by R.M.Stephens and T.D.Schneider (J.Mol.Biol.1992, 228, 1124-1136). For acceptor splice sites weight matrix approach combined with classification was applied. Resulting accuracy of prediction for donor splice sites and acceptor sites is shown in the Table:


Donor Acceptor
Organism Ac Sp Ac Sp
Human_(vertebrates) 95% 85% 95% 85%
Caenorhabditis_sp. 95% 85% 97% 93%
Arabidopsis_sp. 95% 86% 94% 81%
Aspergillus_sp. 97% 90% 97% 60%
Saccharomyces_sp. 97% 98% 95% 85%


Score of potential splice site is estimating by using weight matrix in accordance with Shapiro and Senapathy (Nucleic Acids Res.1987, 15, 7155-7174).

AVAILABILITY

WEBGENE





REFERENCES

  • Rogozin I.B. and L. Milanesi. Analysis of donor splice signals in different organisms. J. Mol. Evol., 1997, V.45, 50-59.