Genomic analysis using machine learning and large scale data management techniques
Supervisor
Suitable for
Abstract
We will investigate novel risk analysis -- likelihood
of a patient having some medical condition -- using statistical
         analysis of
a variety of genomics data sources. This will make use of some new infrastructure for 
data management
         -- a query language for nested data -- along with the use
of the SPARK framework, coupled with some basic statistics
and machine learning algorithms. No background in genomics or statistics
is necessary, but the project
does require
         knowledge of the basics of data management (e.g. undergrad database
course or some experience with SQL) and good programming
         skills.
 
						
		    
                 
                    