Skip to main content

Predicting protein contacts of CRISPR-Cas9 domains with factored attention

Supervisors

Suitable for

MSc in Advanced Computer Science
Computer Science, Part C
Computer Science, Part B

Abstract

Prerequisites:

  • Essential: Deep Learning in Healthcare
  • Desirable: Knowledge of how attention works

 

Abstract:

Transformer-based models like ESM-2 [1] and AlphaFold 2 [2] have revolutionized protein sequence modelling, structure prediction, and design by treating protein sequences as strings of amino acid tokens and learning the “grammar rules” of such sequences. But what are the underlying principles driving the success of such models?

 

This project aims to be a primer for this problem by exploring the connection between factored attention [3,4,5,6] --- a simplified version of the multi-head attention mechanism [7] used in transformers --- and generalized Potts model, which were traditionally used for unsupervised protein contact prediction. Specifically, the student will implement a single layer factored attention model to extract protein contacts for a given CRISPR-Cas9 domain/interface of interest, and compare the quality of the model’s extracted protein contacts with protein contacts obtained from other approaches. Depending on project progress, various extensions can also be explored.

 

 

[1] Lin, Z., Akin, H., Rao, R., Hie, B., Zhu, Z., Lu, W., ... & Rives, A. (2023). Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 379(6637), 1123-1130.

[2] Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., ... & Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. nature, 596(7873), 583-589.

[3] Bhattacharya, N., Thomas, N., Rao, R., Dauparas, J., Koo, P. K., Baker, D., ... & Ovchinnikov, S. (2021). Interpreting potts and transformer protein  odels through the lens of simplified attention. In PACIFIC SYMPOSIUM ON BIOCOMPUTING 2022 (pp. 34-45).

[4] Bhattacharya, N., Thomas, N., Rao, R., Dauparas, J., Koo, P. K., Baker, D., ... & Ovchinnikov, S. (2020). Single layers of attention suffice to predict protein contacts. Biorxiv, 2020-12.

[5] Caredda, F., & Pagnani, A. (2025). Direct coupling analysis and the attention mechanism. BMC bioinformatics, 26(1), 41.

[6] Rende, R., Gerace, F., Laio, A., & Goldt, S. (2024). Mapping of attention mechanisms to a generalized potts model. Physical Review Research, 6(2), 023057.

[7] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.