Skip to main content

ReStore: Reusing Results of MapReduce Jobs

Iman Elghandour ( Alexandria University and Université Libre de Bruxelles )

Performing data analytics on huge data sets have been very crucial for many enterprises. It was facilitated by the introduction of the MapReduce programming and execution model and other more recent distributed computing platforms. MapReduce users often have analysis tasks that are too complex to express as individual MapReduce jobs, and therefore they use high-level query languages such as Pig or Hive to express their complex tasks. The compilers of these languages translate queries into workflows of MapReduce jobs. In my talk, I will present ReStore, a system that manages the storage and reuse of intermediate results between MapReduce jobs in such workflows. At the end of the talk, I will briefly discuss related ideas for optimizing queries executed on other distributed computing platforms.

Share this: