B. Y. Chen, D. H. Bryant, A. E. Cruess, J. H. Bylund, V. Y. Fofanov, M. Kimmel, O. Lichtarge, and L. E. Kavraki, “Composite Motifs Integrating Multiple Protein Structures Increase Sensitivity for Function Prediction,” in Computational Systems Bioinformatics Conference (CSB2007), 2007, pp. 343–355.
The study of disease often hinges on the biological function of proteins, but determining protein function is a difficult experimental process. To minimize duplicated effort, algorithms for function prediction seek characteristics indicative of possible protein function. One approach is to identify substructural matches of geometric and chemical similarity between motifs representing known active sites and target protein structures with unknown function. In earlier work, statistically significant matches of certain effective motifs have identified functionally related active sites. Effective motifs must be carefully designed to maintain similarity to functionally related sites (sensitivity) and avoid incidental similarities to functionally unrelated protein geometry (specificity). Existing techniques design motifs using the geometry of a single protein structure. Poor selection of this structure can limit motif effectiveness if the selected functional site lacks similarity to functionally related sites. To address this problem, this paper presents composite motifs, which combine structures of functionally related active sites to potentially increase sensitivity. Our experimentation compares the effectiveness of composite motifs with simple motifs designed from single protein structures. On six distinct families of functionally related proteins, leave-one-out testing showed that composite motifs had sensitivity comparable to the most sensitive of all simple motifs and specificity comparable to the average simple motif. On our data set, we observed that composite motifs simultaneously capture variations in active site conformation, diminish the problem of selecting motif structures, and enable the fusion of protein structures from diverse data sources.