Genomic analysis using machine learning and large scale data management techniques
Supervisor
Suitable for
Abstract
We will investigate novel risk analysis -- likelihood
of a patient having some medical condition -- using statistical
analysis of
a variety of genomics data sources. This will make use of some new infrastructure for
data management
-- a query language for nested data -- along with the use
of the SPARK framework, coupled with some basic statistics
and machine learning algorithms. No background in genomics or statistics
is necessary, but the project
does require
knowledge of the basics of data management (e.g. undergrad database
course or some experience with SQL) and good programming
skills.