|Title||:||XStream: Cross-core Spatial Streaming based MLC Prefetchers for Parallel Applications in CMPs|
|Speaker||:||Biswabandan Panda (IITM)|
|Details||:||Thu, 1 Jan, 2015 2:30 PM @ BSB 361|
Hardware prefetchers are commonly used to hide and tolerate off-chip memory latency.
Prefetching techniques in the literature are designed for multiple independent sequential
applications running on a multicore system. In contrast to multiple independent applications,
a single parallel application running on a multicore system exhibits different
behavior. In case of a parallel application, cores share and communicate data and code
among themselves, and there is commonality in the demand miss streams across multiple
cores. This gives an opportunity to predict the demand miss streams and communicate
the predicted streams from one core to another, which we refer as cross-core stream
Our Contribution: We propose cross-core spatial streaming (XStream), a practical and storage-efficient cross-core prefetching technique. XStream detects and predicts the cross-core spatial streams at the private mid level caches (MLCs) and sends the predicted streams in advance to MLC prefetchers of the predicted cores.
Highlight of the Results: We compare the effectiveness of XStream with the state-of-the-art spatial memory streaming. Experimental results demonstrate that, on an average (geometric mean), compared to the state-of-the-art spatial memory streaming, storage efficient XStream reduces the execution time by 11.3% (as high as 24%) and 9% (as high as 29.09%) for 4-core and 8-core systems respectively. This performance improvement comes with a hardware overhead, which is a little more than 1/6th a single MLC.
The results are part of a work that appeared at the 23rd International Conference on Parallel Architectures and Compilation Techniques (PACT), 2014. The detailed version of the work is available here.