PROTEAN CR: Proteomics Toolkit for Ensemble Analysis in Cancer Research

Understanding protein–ligand molecular interactions is fundamental to understanding the role of proteins in complex diseases such as cancer. For instance, there is growing interest in predicting the binding modes of peptide-based ligands (e.g., cyclic and phosphorylated peptides) to inhibit or induce targeted degradation of high-profile cancer targets. Another promising example is the identification of tumor-associated antigens for cancer immunotherapy applications. Both examples involve very specific molecular interactions, provide opportunities for computer-aided design of better cancer treatments, and highlight the need for structural analyses in cancer research. They also require new methods that account for the flexibility and variability of the protein receptors involved in these molecular interactions. The objective of this project is to develop an integrated approach to the structural modeling and analysis of protein–ligand interactions in cancer research that will be implemented in the proteomics toolkit PROTEAN-CR. The proposed toolkit will adopt a data-science approach to the problem by introducing approaches for data acquisition and aggregation, as well as algorithmic advances for handling receptor flexibility and for modeling driver mutations, drug-resistance polymorphisms, and post-translational modifications. PROTEAN-CR will streamline running structural analyses at scale while providing meaningful data analytics. The long-term goal of our research is to fully integrate three-dimensional structural information about proteins and ligands and structural analysis into cancer research. This project is intended to target a wide range of users, from experimentalists with little to no programming experience, to advanced users who are comfortable scripting large-scale analyses and integrating the toolkit with their own computational pipeline.
The central hypothesis is that a unified data-science-inspired approach can be used to address major challenges in structural analysis of protein–ligand interactions in cancer research at scale. The first aim will incorporate protein flexibility in docking studies for cancer research. Specific workflows will be used to generate ensembles of protein conformations (receptor flexibility) and innovative machine learning methods will be implemented aiming at a better scoring of protein–ligand complexes. The second aim will focus on including cancer variability into structural analysis. We aim to fill the gap that exists between available data on cancer variants and the structural analysis of ensembles of tumor-associated mutations and protein modifications. Finally, the third aim will focus on customization, interpretability and scalability, where user-friendly methods will be deployed to manage ensembles of protein-ligand complexes.
PROTEAN-CR will be developed focusing on specific cancer-related projects, and with a broad network of collaborators, enabling the design, implementation and evolution of the tool according to the needs of the cancer research community.

More information available at

This work has been supported by grant NCI 1U01CA258512-01.

Related Publications

  1. R. Fasoulis, G. Paliouras, and L. E. Kavraki, “Graph representation learning for structural proteomics,” Emerging Topics in Life Sciences, Oct. 2021. ETLS20210225
  2. E. E. Litsa, P. Das, and L. E. Kavraki, “Machine learning models in the prediction of drug metabolism: challenges and future perspectives,” Expert Opinion on Drug Metabolism & Toxicology, vol. 0, no. 0, pp. 1–3, 2021. PMID: 34706606