Smarter caching boosts chip performance
1 min read
Computer scientists in the US have developed a set of caching strategies for multicore chips that, in simulations, has significantly improved chip performance while actually reducing energy.
The approach, developed by a team from MIT, is said to enable a 15% reduction in execution time and energy savings of 25%.
Typically, the caches on multicore chips are arranged in a hierarchy. Each core has its own private cache, while all the cores share the so-called last-level cache, or LLC.
Chips' caching protocols usually adhere to the simple principle of spatiotemporal locality, in which every requested data item gets stored, along with those immediately adjacent to it, in the private cache. If the data then falls idle, it is squeezed out by more recently requested data, falling down through the hierarchy until it is requested again.
While the principle of spatiotemporal locality is largely reliable, there are cases when it breaks down. And this is where the MIT chip comes in.
When an application's working set exceeds the private-cache capacity, the researchers' chip would simply split it up between the private cache and the LLC. Data stored in either place would stay put, no matter how recently it's been requested, preventing a lot of pointless swapping.
Conversely, if two cores working on the same data are constantly communicating in order to keep their cached copies consistent, the chip would store the shared data at a single location in the LLC.
The cores would then take turns accessing the data, rather than clogging the network with updates.
The researchers examined the case where, to the contrary, two cores are working on the same data but communicating only infrequently.
While the LLC is usually treated as a single large memory bank, physically it is distributed across the chip in discrete chunks. The researchers therefore decided to develop a second circuit that can treat these chunks as extensions of the private cache.
"If two cores are working on the same data, each will receive its own copy in a nearby chunk of the LLC, enabling much faster data access," explained George Kurian, a graduate student in MIT's Department of Electrical Engineering and Computer Science.
There are some drawbacks to the technique, however. Because the system has to be monitored continuously, additional circuitry is required (about 5% of the area of the LLC).
Despite this, the researchers believe the technology could be commercialised, "because chip space is not as crucial a concern as minimising data transfer".