In order to know which cores have copies of which data, many core devices have a directory in shared cache memory and this, says the team, takes up a large amount of space. As an example, the researchers suggest the directory in a 64 core chip might need 12% of the shared memory.
By increasing the memory requirement as a log of the number of cores, rather than in direct proportion, more efficient use is made of the available memory. So, in a 128 core chip, the MIT technique would require only one-third as much memory, while a 256 core chip would need only 20%.
The research, conducted by Xiangyao Yu and Professor Srini Devadas, has concluded the physical-time order of distributed computations doesn’t matter, so long as their logical-time order is preserved. So core A can keep working on a piece of data that core B has since overwritten, provided the rest of the system treats core A’s work as having preceded core B’s.
In the Tardis system, each core has a counter and each data item in memory has an associated counter. When a program launches, all counters are set to zero. When a core reads a piece of data, it takes out a ‘lease’ on it, incrementing the data item’s counter to a particular number, say x. As long as the core’s internal counter doesn’t exceed that number, its copy of the data is valid.
When a core needs to overwrite data, it takes ‘ownership’ of it. Other cores can continue working on locally stored copies of the data, but if they want to extend their ‘leases’, they have to coordinate with the data item’s ‘owner’.
If a core needs to overwrite data, it takes ownership and sets its counter to x+1, which shows it as operating at a later logical time than the other cores. This idea of leaping forward in time is what gives the system its name.
In addition to saving space in memory, Tardis is said to eliminate the need to broadcast invalidation messages to all cores sharing a data item. In massively multicore chips, Yu says, this could also lead to performance improvements.