An Accurate, Sensitive, and Scalable Method to Identify Functional Sites in Protein Structures

H. Yao, D. M. Kristensen, I. Mihalek, M. Sowa, C. Shaw, M. Kimmel, L. E. Kavraki, and O. Lichtarge, “An Accurate, Sensitive, and Scalable Method to Identify Functional Sites in Protein Structures,” Journal of Molecular Biology, vol. 326, no. 1, pp. 255–261, 2003.


Functional sites determine the activity and interactions of proteins and as such constitute the targets of most drugs. However, the exponential growth of sequence and structure data far exceeds the ability of experimental techniques to identify their locations and key amino acids. To fill this gap we developed a computational Evolutionary Trace method that ranks the evolutionary importance of amino acids in protein sequences. Studies show that the best-ranked residues form fewer and larger structural clusters than expected by chance and overlap with functional sites, but until now the significance of this overlap has remained qualitative. Here, we use 86 diverse protein structures, including 20 determined by the structural genomics initiative, to show that this overlap is a recurrent and statistically significant feature. An automated ET correctly identifies seven of ten functional sites by the least favorable statistical measure, and nine of ten by the most favorable one. These results quantitatively demonstrate that a large fraction of functional sites in the proteome may be accurately identified from sequence and structure. This should help focus structure–function studies, rational drug design, protein engineering, and functional annotation to the relevant regions of a protein.