Enabling domain-specific operations in real-time systems
Supervisor
Suitable for
Abstract
Internet of Things applications analyze data coming from large networks of sensor devices using relational and domain-specific operations, like machine learning and signal processing algorithms. To support such increasingly important scenarios, many data management systems integrate with numerical frameworks like R. Such solutions, however, incur significant performance penalties as relational data processing engines and numerical tools operate on fundamentally different data models -- relational and array data models -- with expensive intercommunication mechanisms. In addition, none of these solutions supports efficient real-time analysis. In this project, we aim to reconcile these disparate data models and provide a common query language that allows both relational and array-based operations in real-time analysis. We plan to extend a popular stream processing system, called Heron (developed by Twitter and used for real-time analysis of tweets), with capabilities for efficient array-based processing. Heron is open sourced, so there is a potential for significant contributions to the open-source community.Prerequisites: solid programming skills (Java/Scala preferable), good knowledge of databases, some experience with parallel/distributed programming is desirable but not essential