LaminarIR: Compile-Time Queues for Structured Streams (PLDI 2015 - Research Papers)

Who

Yousun Ko, Bernd Burgstaller, Bernhard Scholz

Track

PLDI 2015 Research Papers

Time Zone

The program is currently displayed in (GMT-07:00) Tijuana, Baja California.

Use conference time zone: (GMT-07:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 15 Jun 2015 16:00 - 16:25 at PLDI Main RED (Portland 256) - Optimization Chair(s): Michelle Strout

Abstract

Stream programming languages employ FIFO~(first-in, first-out) semantics to model data channels between producers and consumers. A FIFO data-channel stores tokens in a buffer that is accessed indirectly via read- and write-pointers. This indirect token-access decouples a producer’s write-operations from the read-operations of the consumer, thereby making data-flow implicit. For a compiler, indirect token-access obscures data-dependencies, which renders standard optimizations ineffective and negatively impacts stream program performance.

In this paper we propose a transformation for structured stream programming languages such as StreamIt that shifts FIFO buffer management from run-time to compile-time and eliminates splitters and joiners, whose task is to distribute and merge streams. To show the effectiveness of our lowering transformation, we have implemented a StreamIt to C compilation framework. An own intermediate representation (IR) has been developed that facilitates the transformation. We report on the enabling effect of our IR on LLVM’s optimizations, which required the conversion of several standard StreamIt benchmarks from static to randomized inputs, to prevent computation of partial results already at compile-time. Experimental evaluation was conducted on the Intel i7-2600K, AMD Opteron 6378, Intel Xeon Phi 3120A and ARM Cortex-A15 platforms. Our code generator reduces data communication on average by 35.9% and achieves platform-specific speed-ups between 3.73x and 4.98x over StreamIt. We reduce memory accesses by more than 60% and achieve energy savings of up to 93.6% on the Intel i7-2600K.

Yousun Ko

Yonsei University

Bernd Burgstaller

Yonsei University

Bernhard Scholz

The University of Sydney

Media