Neural network genomics
Background. For more than 10 years, GWAS studies have represented a revolution in the study of human disorders and human phenotypes. By measuring how your risk of suffering any given disease changes according to SNP mutations, GWAS studies can measure how relevant each gene is to the disease under study. However, this SNP-disease association is calculated â€œone SNP at a timeâ€?, and ignores all gene-gene interactions that are often crucially important to the causation of the disorder. If the interactions between a number of genes (e.g. insulin and its receptor in type 2 diabetes; or APP and alpha-, beta- and gamma- secretase in Alzheimerâ€¦) is what produces a given disorder, this interaction will not be detected in a GWAS analysis. This shortcoming may not be a problem in monogenetic hereditable disorders, such as Huntington's disease, where mutations in a single gene by itself are enough for causing the disease. However, GWAS studies will likely not uncover the mechanisms of complex disorders, where the disease emerges from the interaction of a number of genes. This is likely the source of the so called â€œmissing hereditabilityâ€? problem observed in most complex traits or diseases, where all the SNP variations taken together cannot account for most of the hereditability of a given trait or disease [Manolio et al]. In addition, it has been demonstrated that complex traits such as height and BMI are clearly and strongly hereditable [Cesarini and Visscher], but GWAS studies simply cannot detect most of this hereditability. In summary, GWAS analyses detect simple individual genetic factors, but not interactions between genetic factors.
Project. We propose to identify the interacting genetic factors behind Alzheimerâ€™s disease and other neurodegenerations with neural networks, which are known for exploiting the interactions present in the to-be analysed data. While the linear models used in GWAS studies are able to identify only linear and monovariated contributions of each gene to a disorder, neural networks can analyse how genes interact with each other to explain the studied disorder. In our laboratories we already have the hardware and the expertise required to build neural networks, and we have used them in other research areas relevant to Alzheimerâ€™s disease. In addition, we have access and experience using UK Biobank, which is the ideal dataset to implement this project. In UK Biobank they have measured wide array SNPs, all disorders and demographics in ~500,000 participants between the ages of 37 and 73, and more than 5000 of them suffer from Alzheimerâ€™s disease, Parkinsonâ€™s disease or other neurodegenerations.
We propose the MSc student to build a neural network to predict either diagnoses or disease-related endophenotypes (i.e. brain volumes of affected areas, cognitive scoresâ€¦) of each one of these participants, using only the information present in the wide array SNPs and relevant demographics. The student is free to use the extension of the feed forward DNN developed in our lab, or to explore other feed forwards or recurrent (e.g. RNN, LSTM or GRU) alternatives. The DNN should be implemented in Pythonâ€™s Keras, Theano, Tensorflow, or PyTorch.
T.A. Manolio et al., Finding the missing heritability of complex diseases, Nature. 461 (2009) 747â€“753. doi:10.1038/nature08494.
D. Cesarini, P.M. Visscher, Genetics and educational attainment, Npj Sci. Learn. 2 (2017). doi:10.1038/s41539-017-0005-6.