Mapping the Structural Landscape of Protein Families with Geometric Feature Vectors

D. H. Bryant, “Mapping the Structural Landscape of Protein Families with Geometric Feature Vectors,” Master's thesis, Rice University, Department of Computer Science, Houston, TX, 2009.

Abstract

This thesis describes two key results that can be used separately or in combination for protein function analysis. The Family-wise Analysis of SubStructural Templates (FASST) method uses all-against-all substructure comparison to determine family-wide sub-group organization by quantifying the substructural variation within a protein family. The re- sults demonstrate examples of automatically determined sub-groups that can be linked to phylogenetic distance between family members, segregation by ligation state, and organization by ancestry among convergent protein lineages. The Motif Ensemble Statistical Hypothesis (MESH) framework constructs a representative template for each of the sub- groups determined by FASST to build motif ensembles that are shown through a series of function prediction experiments to improve the function prediction power of existing tem- plates. This work provides an unbiased, automated assessment of the structural variability of identified substructures among protein structure families and a technique for exploring the relation of substructural variation to protein function.

PDF preprint: http://kavrakilab.org//publications/bryant2009mapping-structural-landscape.pdf