Skip to main content

Genomic analysis using machine learning and large scale data management techniques


Suitable for

MSc in Computer Science


We will investigate novel risk analysis -- likelihood
of a patient having some medical condition -- using statistical analysis of
a variety of genomics data sources. This will make use of some new infrastructure for
data management -- a query language for nested data -- along with the use
of the SPARK framework, coupled with some basic statistics
and machine learning algorithms. No background in genomics or statistics
is necessary, but the project
does require knowledge of the basics of data management (e.g. undergrad database
course or some experience with SQL) and good programming skills.