|Title||:||PBC: Prefetched Blocks Compaction|
|Details||:||Fri, 17 Jul, 2015 3:30 PM @ BSB 361|
|Abstract:||:||Last-level-cache (LLC) is one of the critical resources of a multi-core system. Micro-architectural techniques that provide more cache hits at the LLC are the key. Two such techniques are cache compression and hardware prefetching.
Cache compression techniques increase the effective cache capacity without increasing the cache size. The techniques are based on the observation that significant amount of data accessed/generated by a program share common patterns that can be represented (stored) using fewer number of bits. Hardware prefetching, on the other hand, hides the off-chip DRAM latency. Instead of waiting for a cache miss to fetch data from the DRAM, a prefetcher trains itself to identify such misses ahead of time and prefetches data into the cache, thereby converting the cache miss into a potential cache hit. One of the key features of the addresses generated by the commonly used prefetchers (such as stride and stream) is that these addresses correlate in space due to spatial locality.
We observe that the key features of prefetching can be used for compression, which can further increase the effective cache capacity. Our goal in this work is to design a mechanism that can exploit the features of hardware prefetching for compressing multiple prefetched blocks into a single block. We treat multiple contiguous prefetched cache blocks that share common higher order address bits as one single cache block, exploiting the data patterns present across these blocks and propose prefetched blocks compaction (PBC), a low-cost and practical compaction technique that compacts multiple prefetched blocks into a single cache block called compact block. We evaluate PBC across a wide variety of workloads and compare its performance to a baseline system with no compaction.