Neural network genomics
AbstractBackground. For more than 10 years, GWAS studies have represented a revolution in the study of human disorders and human phenotypes. By measuring how your risk of suffering any given disease changes according to SNP mutations, GWAS studies can measure how relevant each gene is to the disease under study. However, this SNP-disease association is calculated “one SNP at a time”, and ignores all gene-gene interactions that are often crucially important to the causation of the disorder. If the interactions between a number of genes (e.g. insulin and its receptor in type 2 diabetes; or APP and alpha-, beta- and gamma- secretase in Alzheimer…) is what produces a given disorder, this interaction will not be detected in a GWAS analysis. This shortcoming may not be a problem in monogenetic hereditable disorders, such as Huntington disease, where mutations in a single gene by itself are enough for causing the disease. However, GWAS studies will likely not uncover the mechanisms of complex disorders, where the disease emerges from the interaction of a number of genes. This is likely the source of the so called “missing hereditability” problem observed in most complex traits or diseases, where all the SNP variations taken together cannot account for most of the hereditability of a given trait or disease . In addition, it has been demonstrated that complex traits such as height and BMI are clearly and strongly hereditable , but GWAS studies simply cannot detect most of this hereditability. In summary, GWAS analyses detect simple individual genetic factors, but not interactions between genetic factors.
Project. In this project, we propose to identify the interacting genetic factors behind Alzheimer’s disease and other neurodegenerations with neural networks, which are known for exploiting the interactions present in the to-be analysed data. While the linear models used in GWAS studies are able to identify only linear and monovariated contributions of each gene to a disorder, neural networks can analyse how genes interact with each other to explain the studied disorder. This versatility is what has allowed neural networks to find such widespread use in industry in the last decade, where they are revolutionizing image, sound and language analysis [3–5]. In our laboratories we already have the hardware and the expertise required to build neural networks, and we have used them in other research areas relevant to Alzheimer’s disease. In addition, we have access and experience using UK Biobank, which is the ideal dataset to implement this project. In UK Biobank they have measured wide array SNPs, all disorders and demographics in ~500,000 participants between the ages of 37 and 73, and more than 5000 of them suffer from Alzheimer’s disease, Parkinson’s disease or other neurodegenerations.
We propose the MSc student to build a neural network to predict either diagnoses or disease-related endophenotypes (i.e. brain volumes of affected areas, cognitive scores…) of each one of these participants, using only the information present in the wide array SNPs and relevant demographics. The student is free to use the extension of the feed forward DNN developed in our lab, or to explore other feed forwards or recurrent (e.g. RNN, LSTM or GRU) alternatives. The DNN should be implemented in Python’s Keras (https://keras.io/), Theano (http://deeplearning.net/software/theano/), Tensorflow (https://www.tensorflow.org/), or PyTorch (http://pytorch.org/).