Even Better Stream Fusion
- 14:00 18th February 2022 ( week 5, Hilary Term 2022 )Zoom
Note that this talk is at the earlier time of 2pm (14:00 UTC).
Stream processing is one of the key data processing modes, related to dataflow programming. It was dominant in the punch-card era, and is becoming prevalent again, in the era of huge data, ubiquitous sensors and distributed computing. Its characteristic is incremental, sequential processing with bounded buffering, which lets one handle possibly unbounded amount of data in limited space. Another characteristic is the ease of specifying it as a Xmas-lights diagram: if some further processing is needed, just plug in another segment.
Although the diagrams are easy to draw, they are difficult to implement with low latency and in low memory. This talk is about the key optimization: stream fusion, which is combining several simple processing steps into one complex step, reducing the amount of intermediary data and communication overhead. Specifically, we will talk about complete fusion: not just reduction but complete elimination. This is hard, especially for diagrams with "fat pipes" (flatmap) and "joins" (zip).
This talk introduces the ongoing work on strymonas, which is a high-performance code generation library (DSL) that converts a diagram-like specification into hand-written-like code -- with assured complete fusion. We describe the main ideas behind the complete fusion of diagrams with joins, and illustrate on the example of the software FM radio.
Oleg Kiselyov is an Assistant Professor at Tohoku University in Japan. He got interested in stream processing when automating scientific instruments (calorimeters and neuron activity recording) 35 years ago. In 1990s he wrote and maintained a C++ linear algebra library based on streams rather than arrays. Later on he wrote a streaming XML parser, still used in Scheme community, and designed Iteratees. His latest interest is generating fast stream processing code.