Skip to main content

DeepEmbCas9: Cas9 coevolution and sgRNA structural information for CRISPR−Cas9 cleavage activity prediction

Jeffrey Mak and Peter Minary

Abstract

The development of CRISPR-Cas9 cleavage activity prediction tools hinges on data produced from high-throughput guide-target lentiviral library screens for different Cas9 variants. However, the majority of such tools remain limited to predictions for one or few Cas9 variants, making it difficult to quantify the effects of Cas9 residues on cleavage activity. To bridge the gap, we introduce 4 interpretable DeepEmbCas9 models for the cleavage activity prediction of 40 type II-A and II-C Cas9 variants − DeepEmbCas9, DeepEmbCas9-MVE, DeepEnsEmbCas9 naive, and DeepEnsEmbCas9 − leveraging protein and RNA language model embeddings to encode Cas9 and sgRNA, respectively. Among the 4 neural network models, DeepEnsEmbCas9 naive performed the best in both in-distribution and out-of-distribution settings, where DeepEnsEmbCas9 naive outperformed individual Cas9 cleavage activity prediction tools on 18 out of 51 and 17 out of 48 benchmark test sets, respectively, and performed comparably otherwise. Concerning uncertainty quantification, DeepEnsEmbCas9 yields quantile-calibrated uncertainty estimates while keeping a minimal performance drop compared to DeepEnsEmbCas9 naive. SHAP importance analysis on DeepEmbCas9 reaffirms the importance of Cas9-target PAM binding as a first step for Cas9 cleavage, and reveals the L2 linker and PLL-WED-PI as important Cas9 domains modulating DeepEmbCas9\textquoterights predicted activity change when introducing increased-fidelity and PAM-altering Cas9 mutations, respectively. Our findings demonstrate the usefulness of protein language model embeddings in uncertainty-aware Cas9 cleavage activity prediction. More generally, DeepEmbCas9 models serves as an initial step towards cleavage activity prediction modelling for the whole Cas9 protein family.Competing Interest StatementThe authors have declared no competing interest.

Journal
bioRxiv
Publisher
Cold Spring Harbor Laboratory
Year
2025