Kavraki Lab


Proteins are flexible macromolecules, often changing shape/structure as needed to interact with other molecules. Structural fluctuations of proteins are related to their function. All the different structures that a protein assumes under physiological conditions contribute to its biological function. Therefore, to describe and understand the biological function of a protein, it is important to model all the different structures that a protein can assume at equilibrium conditions.

Experimental techniques such as X-ray crystallography, Nuclear Magentic Resonance (NMR), and cryo Electron Microscopy (cryoEM) offer a limited view into one or a few structures available to a protein at equilibrium. Traditionally, obtaining a thorough picture of all equilibrium structures of a protein has been the task of computational techniques. Currently, Molecular Dynamic (MD) and Monte Carlo (MC) simulations, are limited in their ability to model structural fluctuations that occur on timescales beyond nanoseconds. Fluctuations that are important to model in order to better understand and describe function may occur on timescales longer than nanoseconds, such as microseconds or even milliseconds.


Given an experimentally determined structure of a protein, model equilibrium fluctuations around this structure with no timescale limitations. The goal is to obtain an ensemble of structures that a protein assumes at equilibrium. The obtained ensemble needs to be representative of all the possible equilibrium structural fluctuations of a protein. We developed the following applications to address this problem:

Fragment Ensemble Method (FEM)

FEM obtains an ensemble of physical configurations of particular fragment of a protein polypeptide chain.Our approach is complementary to current simulation techniques. We take the following course:

  • model proteins as articulated manipulators
  • sample manipulator configurations corresponding to protein structures
  • refine the energy of each obtained structure
  • weight each structure by its Boltzmann probability

Consider one fragment of the polypeptide chain from amino acid a to amino acid b. This fragment can correspond to a loop for example. Modeling the equilibrium fluctuations of the fragment corresponding to the loop requires finding configurations of the fragment where amino acids a and b remain connected to the rest of the polypeptide chain. This is illustrated in the following figure, left, for the 12 amino acid loop of cytochrome inhibitor 2.

FEM represents the fragment at the backbone level as an open kinematic chain. Backbone conformations are then sampled similarly to sampling configurations of the kinematic chain. An inverse kinematics technique, Cyclic Coordinate Descent (CCD) is applied to each chain configuration to satisfy the end point constraints. Finally, the side chains are put back on the backbone and optimal dihedral angles are sampled for the side chains. Energetic refinement of the entire fragment allows to minimize unfavorable interactions. Each resulting configuration is weighted by its Boltzmann probability, which measures the feasibility of the configuration at equilibrium.

Left: The fragment corresponding to the loop in grey, needs to connect to the rest of the protein structure, in blue. This introduces spatial constraints on the end points of the fragment. Right: The main steps of FEM, described below, are illustrated.


We applied FEM to model equilibrium fluctuations of proteins such as cytochrome inhibitor2 and variable surface antigen. The ensembles for each one are shown in the Figures below.

Left: The obtained loop conformations are shown in transparent, superimposed over the lowest energy structure generated, shown in opaque. The X-ray structure of the variable surface antigen misses this loop due to the mobility of the loop causing disorder in the crystal. The heterogeneity of the obtained ensemble agrees well with the hypothesized high equilibrium mobility of the loop. Right: Measured fluctuations of the loop are compared to PONDR scores that measure mobility given sequence information alone. The datasets are normalized for the comparison since they are of different magnitudes. The agreement is significant, even though the purpose of the comparison of the datasets is mostly qualitative.

Protein Ensemble Method (PEM)

PEM models equilibrium fluctuations of an entire protein. The main steps of PEM, illustrated below, are as follows: PEM first divides the protein polypeptide chain into consecutive fragments of significant overlap. This is illustrated below on the 123 amino acid sequence of alpha-lactalbumin.

Sliding a window of length 30 aminoacids over the sequences defines 19 fragments where neighboring fragments overlap in 25 amino acids with one another. For each fragment, FEM is applied to obtain an ensemble of low-energy fragment conformations. These are pictorially illustrated by the ensembles inside each window. The final step of PEM combines fluctuations of neighboring fragments to obtain equilibrium fluctuations of the entire chain.

Left: The main steps of PEM, described below, are illustrated. Right: PEM-measured RMSD values of the amino acids of the entire polypeptide chain of alpha-lactalbumin are shown. RMSD values of amino acids belonging to different fragments are colored differently.


We have applied PEM to model equilibrium fluctuations of entire proteins such as ubiquitin and protein G. For each protein, the obtained ensembles are compared to available NMR data that measure equilibrium fluctuations over a broad range of timescales, from picoseconds to milliseconds. In each case we obtain very high correlations with experimental data, as the figure below indicates.

Left: The obtained conformations for protein G are shown in transparent, superimposed over the experimentally available protein G native structure, shown in opaque. Right: Amide order parameters measured over the generated ensemble are compared to order parameters that quantify reorientations of the amide bond that occur on slow timescales. The agreement is high, with a Pearson correlation of 83%.

Obtaining a good agreement with NMR data that span multiple timescales is highly non-trivial. NMR data such as methyl order parameters, scalar couplings, and residual dipolar couplings may report equilibrium fluctuations that occur on timescales as slow as milliseconds.

Multiscale Space Exploration (MuSe)

This method efficiently explores the high-dimensional conformational space of a protein using only knowledge of the protein’s amino-acid sequence. MuSE proceeds in two stages. The method first obtains a broad view of the entire conformational space at a coarse-grained level of detail. In the second stage, the exploration focuses to few selected low-energy regions in the space. In its first stage, the method searches a coarse-grained conformational space, employing structural databases to assemble low-resolution structures. The method adopts the fragment-based assembly of protein conformations. However, the proposed method focuses on computing not just one structure, but ensembles of native-like conformations that may be potentially diverse.

The fragment-based assembly is employed in the context of a simulated annealing exploration, which employs a coarse-grained force field to guide the assembly process. During the first stage of the exploration MuSE adds atomic detail on the fly to detect emerging energy minima possibly relevant in an all-atom view of the conformational space. This detail is stripped off to continue exploring the coarse-grained space. Atomistic refinement and further analysis of the explored conformational space is conducted in the second stage, after MuSE obtains a broad view of the coarse-grained conformational space relevant for the native state. Low-dimensional embedding highlights energy minima that are further populated by the method in all-atom detail.


MuSe has been applied to Calbindin, Calmodulin and Adenylate Kinase. The figure below shows the free energy landscape obtained for Calmodulin.

(a) Red-to-blue color spectrum in 2D landscape obtained for Calmodulin denotes high-to-low energy values. The lowest energy minima are labeled A, B, and C. PDB structures are projected on the landscape: 1cfd in magenta, 1cll in blue, and 2f3y in green. (a1), (b1), and (c1), respectively, show ensembles corresponding to minima A, B, and C. Conformations are superimposed in transparent over lowest-energy ones drawn in opaque. (a2), (b2), and (c2) show the respective conformational ensembles obtained with PEM from each of the lowest-energy conformations.