Skip to main content

Complex Object Querying and Data Science

Supervisor

Suitable for

MSc in Advanced Computer Science

Abstract

"We will look at query languages for transforming nested collections (collections that might contain collections). Such languages can be useful for preparing large scale feature data for machine learning algorithms. We have a basic implementation of such a language that we implement on top of the big-data framework Spark. The goal of the project is to extend the language with iteration. One goal will be to look at how to adapt processing techniques for nested data to support iteration. Another, closer to application is to utilize iteration to support additional steps of a data science pipeline, such as sampling. "