Skip to main content

Development of deep learning-based cleavage activity prediction models for genome editors

Supervisors

Suitable for

Computer Science, Part B

Abstract

Prerequisites:

  • Essential: Machine Learning and/or Deep Learning in Healthcare
  • Desirable: Prior knowledge or interest in molecular biology

 

Abstract:

Genome editors, i.e., DNA-cutting enzymes, have revolutionized the field of gene therapy, as seen by the Nobel Prize in Chemistry 2020 and the Food and Drug Administration’s approval of Casgevy --- the world’s first CRISPR-based gene therapy --- in 2023. Component-wise, genome editors are composed of two parts: a single guide RNA (sgRNA) which directs the editor to the target DNA site of interest, and the enzyme, which is responsible for binding and cleavage of the target site [1]. Broadly speaking, genome editors mechanistically operate in three steps: binding of the enzyme to the DNA, sgRNA-DNA heteroduplex formation, and DNA cleavage.

 

Since the sgRNA’s spacer sequence and the target sequence are the primary factors affecting a genome editor’s cleavage activity, this project aims to address the cleavage activity prediction problem by learning the function mapping between the spacer-target pair and cleavage activity of a recently discovered genome editor. More concretely, the student will routinely apply deep learning [2] on high-throughput cleavage activity data available in the literature [3,4], thereby obtaining a prediction model with good test performance metrics.

 

[1] Jiang, F., & Doudna, J. A. (2017). CRISPR–Cas9 structures and mechanisms. Annual review of biophysics, 46, 505-529.

[2] Kim, N., Kim, H. K., Lee, S., Seo, J. H., Choi, J. W., Park, J., ... & Kim, H. H. (2020). Prediction of the sequence-specific cleavage activity of Cas9 variants. Nature Biotechnology, 38(11), 1328-1336.

[3] Sung, K., Jung, Y., Kim, N., Kim, Y. W., Kim, H. H., Kim, S. K., & Bae, S. (2025). A rational engineering strategy for structural dynamics modulation enables target specificity enhancement of the Cas9 nuclease. Nucleic Acids Research, 53(12), gkaf535.

[4] Crawford, K. D., Khan, A. G., Lopez, S. C., Goodarzi, H., & Shipman, S. L. (2025). High throughput variant libraries and machine learning yield design rules for retron gene editors. Nucleic Acids Research, 53(2), gkae1199.