PROTEAN CR: Proteomics Toolkit for Ensemble Analysis in Cancer Research

Understanding protein–ligand molecular interactions is fundamental to understanding the role of proteins in complex diseases such as cancer. For instance, there is growing interest in predicting the binding modes of peptide-based ligands (e.g., cyclic and phosphorylated peptides) to inhibit or induce targeted degradation of high-profile cancer targets. Another promising example is the identification of tumor-associated antigens for cancer immunotherapy applications. Both examples involve very specific molecular interactions, provide opportunities for computer-aided design of better cancer treatments, and highlight the need for structural analyses in cancer research. They also require new methods that account for the flexibility and variability of the protein receptors involved in these molecular interactions. The objective of this project is to develop an integrated approach to the structural modeling and analysis of protein–ligand interactions in cancer research that will be implemented in the proteomics toolkit PROTEAN-CR. The proposed toolkit will adopt a data-science approach to the problem by introducing approaches for data acquisition and aggregation, as well as algorithmic advances for handling receptor flexibility and for modeling driver mutations, drug-resistance polymorphisms, and post-translational modifications. PROTEAN-CR will streamline running structural analyses at scale while providing meaningful data analytics. The long-term goal of our research is to fully integrate three-dimensional structural information about proteins and ligands and structural analysis into cancer research. This project is intended to target a wide range of users, from experimentalists with little to no programming experience, to advanced users who are comfortable scripting large-scale analyses and integrating the toolkit with their own computational pipeline.
The central hypothesis is that a unified data-science-inspired approach can be used to address major challenges in structural analysis of protein–ligand interactions in cancer research at scale. The first aim will incorporate protein flexibility in docking studies for cancer research. Specific workflows will be used to generate ensembles of protein conformations (receptor flexibility) and innovative machine learning methods will be implemented aiming at a better scoring of protein–ligand complexes. The second aim will focus on including cancer variability into structural analysis. We aim to fill the gap that exists between available data on cancer variants and the structural analysis of ensembles of tumor-associated mutations and protein modifications. Finally, the third aim will focus on customization, interpretability and scalability, where user-friendly methods will be deployed to manage ensembles of protein-ligand complexes.
PROTEAN-CR will be developed focusing on specific cancer-related projects, and with a broad network of collaborators, enabling the design, implementation and evolution of the tool according to the needs of the cancer research community.

More information available at https://reporter.nih.gov/search/DZaxB9c7-kWkwmA8MnbGVg/project-details/10188196

This work has been supported by grant NCI 1U01CA258512-01.

Related Publications

  1. R. Fasoulis, M. M. Rigo, G. Lizée, D. A. Antunes, and L. E. Kavraki, “APE-Gen2.0: Expanding Rapid Class I Peptide-Major Histocompatibility Complex Modeling to Post-Translational Modifications and Noncanonical Peptide Geometries,” Journal of Chemical Information and Modeling, vol. 64, no. 5, pp. 1730–1750, Mar. 2024.
    Details
  2. A. Conev, R. Fasoulis, S. Hall-Swan, R. Ferreira, and L. E. Kavraki, “HLAEquity: Examining biases in pan-allele peptide-HLA binding predictors,” iScience, vol. 27, no. 1, Jan. 2024.
    Details
  3. R. Fasoulis, M. M. Rigo, D. A. Antunes, G. Paliouras, and L. E. Kavraki, “Transfer learning improves pMHC kinetic stability and immunogenicity predictions,” ImmunoInformatics, vol. 13, p. 100030, 2024.
    Details
  4. A. Conev, M. M. Rigo, D. Devaurs, A. F. Fonseca, H. Kalavadwala, M. V. de Freitas, C. Clementi, G. Zanatta, D. A. Antunes, and L. E. Kavraki, “EnGens: a computational framework for generation and analysis of representative protein conformational ensembles,” Briefings in Bioinformatics, p. bbad242, Jul. 2023.
    Details
  5. S. Hall-Swan, J. Slone, M. M. Rigo, D. A. Antunes, G. Lizée, and L. E. Kavraki, “PepSim: T-cell cross-reactivity prediction via comparison of peptide sequence and peptide-HLA structure,” Frontiers in Immunology, vol. 14, 2023.
    Details
  6. E. E. Litsa, V. Chenthamarakshan, P. Das, and L. E. Kavraki, “An end-to-end deep learning framework for translating mass spectra to de-novo molecules,” Communications Chemistry, vol. 6, no. 1, p. 132, 2023.
    Details
  7. K. R. Jackson, D. A. Antunes, A. H. Talukder, A. R. Maleki, K. Amagai, A. Salmon, A. S. Katailiha, Y. Chiu, R. Fasoulis, M. M. Rigo, J. R. Abella, B. D. Melendez, F. Li, Y. Sun, H. M. Sonnemann, V. Belousov, F. Frenkel, S. Justesen, A. Makaju, Y. Liu, D. Horn, D. Lopez-Ferrer, A. F. Huhmer, P. Hwu, J. Roszik, D. Hawke, L. E. Kavraki, and G. Lizée, “Charge-based interactions through peptide position 4 drive diversity of antigen presentation by human leukocyte antigen class I molecules,” PNAS Nexus, vol. 1, no. 3, Aug. 2022.
    Details
  8. M. M. Rigo, R. Fasoulis, A. Conev, S. Hall-Swan, D. Amaral Antunes, and L. Kavraki, “SARS-Arena: Sequence and Structure-Guided Selection of Conserved Peptides from SARS-related Coronaviruses for Novel Vaccine Development,” Frontiers in Immunology, vol. 13, Jul. 2022.
    Details
  9. A. Conev, D. Devaurs, M. M. Rigo, D. A. Antunes, and L. E. Kavraki, “3pHLA-score improves structure-based peptide-HLA binding affinity prediction,” Scientific Reports, vol. 12, no. 1, Jun. 2022.
    Details
  10. R. F. Tarabini, M. M. Rigo, A. Faustino Fonseca, F. Rubin, R. Bellé, L. E. Kavraki, T. C. Ferreto, D. Amaral Antunes, and A. P. D. de Souza, “Large-Scale Structure-Based Screening of Potential T Cell Cross-Reactivities Involving Peptide-Targets From BCG Vaccine and SARS-CoV-2,” Frontiers in Immunology, vol. 12, Jan. 2022.
    Details
  11. R. Fasoulis, G. Paliouras, and L. E. Kavraki, “Graph representation learning for structural proteomics,” Emerging Topics in Life Sciences, Oct. 2021.
    Details
  12. E. E. Litsa, P. Das, and L. E. Kavraki, “Machine learning models in the prediction of drug metabolism: challenges and future perspectives,” Expert Opinion on Drug Metabolism & Toxicology, vol. 0, no. 0, pp. 1–3, 2021. PMID: 34706606
    Details
  13. S. Hall-Swan, D. Devaurs, M. M. Rigo, D. A. Antunes, L. E. Kavraki, and G. Zanatta, “DINC-COVID: A webserver for ensemble docking with flexible SARS-CoV-2 proteins,” Computers in Biology and Medicine, vol. 139, p. 104943, 2021.
    Details