Rolling the dice
4 mins read
Xilinx moves to a multidie architecture to meet demands for high fpga performance
The most aggressive users of fpgas demand ever larger devices with more and more on chip resources. And, because these leading edge companies tend to come from the communications sector, they want more and more bandwidth and ever lower power consumption. Leading fpga developers provide these features by adopting the latest process technology as soon as it becomes available.
This relentless pursuit of Moore's Law, however, is becoming increasingly more difficult. While the use of Moore's Law – the so called More Moore – will continue, all companies working at the leading edge are having to develop innovative solutions, an approach being described as More than Moore.
Until recently, fpga developers took advantage the pioneering work of memory manufacturers to move to the next technology node. FPGAs and memories, although different products, are similar in that their structures are broadly uniform across the die. But, in order to link the various parts of the device together, fpgas need to take maximum advantage of the number of metal layers, and that is challenging at the leading edge.
Both the leading fpga developers are now working at the 28nm node and have announced, but not yet delivered, products. And both recognise they need to adopt a More than Moore approach if they are to continue to meet their customers needs.
Now, Xilinx has made a More than Moore move with the use of multidie fpga technology, along with stacked silicon interconnect, for some devices in its recently announced 7 series.
Giles Peckham, Xilinx' EMEA marketing manager, said: "Customers in a number of market segments want more capacity and they want that capacity in a device where they can retain the power, signal integrity and performance of a monolithic die."
Large fpgas have traditionally been used in massive SoC prototyping, but Peckham noted that parallel processing is being increasingly requested for high performance computing applications. "But there are still other areas where designers need processing capacity," he added.
Part of the problem in meeting these needs, however, is manufacturing yield. When companies such as Xilinx go to the next process node, it takes a while before the yield of the largest devices is satisfactory. Smaller devices, however, can be produced more readily. "We've put smaller devices together to get the capacity of a large device," Peckham continued.
The alternative to multiple die is either a multichip module (MCM) or putting two or more devices on a pcb. "But both have latency problems," Peckham noted. "The signal path has a number of steps and, while it's shorter for an MCM, it's still a problem. And when you have to send signals through an I/O buffer, it consumes power"
Patrick Dorsey, senior marketing director for Xilinx, explained. "Xilinx has developed an innovative approach to building fpgas that offers bandwidth and capacity equalling or exceeding that of the largest possible fpga die with the manufacturing and time to market advantages of smaller die to accelerate volume production. These benefits are enabled by stacked silicon interconnect technology, which uses silicon interposers with microbumps and through silicon vias (TSV) to combine multiple highly manufacturable fpga die slices in a single package."
Stacked silicon technology is said by Xilinx to overcome a number of challenges. In particular, previous fpgas did not have sufficient I/O, while the latency involved with routing signals between two or more devices affected performance. Meanwhile, Xilinx notes that using standard I/O to create logical connections increases power consumption.
Xilinx says the solution applies several proven technologies in an innovative way. The combination of TSV and microbump technology with ASMBL – the Application Specific Modular Block Architecture introduced with the Virtex-4 family – has, it believes, created new class of fpgas that delivers the required capacity, performance capabilities and power characteristics.
The approach combines enhanced fpga die slices and a passive silicon interposer to create a die stack in which tens of thousands of die to die connections are available to provide 'ultra high' interconnect bandwidth. This is said to be accomplished with a much lower power consumption than standard I/O and a latency in the nanosecond range. "With die to die communications, we can keep latency between dies to 1ns," Peckham said. "It sounds simple, but the technology has been in development for more than five years."
Within the stacked silicon interconnect structure, data flows between a set of adjacent fpga die through more than 10,000 routing connections. This is said to improve the die to die connectivity bandwidth per Watt metric by a factor of 100.
Closer examination of Xilinx' announcement of the Virtex-7 family shows devices with 1.5million and 2m logic cells. Peckham said: "Those two devices have a bigger capacity than you might expect from a 28nm monolithic device and are being enabled using the multidie approach. This is an integral part of the Virtex-7 family." These parts use three and four fpga dies respectively.
The fpgas build on Xilinx' ASMBL architecture, with three key modifications. Each dice has its own clock regime and configuration circuitry, with the routing architecture modified to enable direct connection through the passivation on the dice's surface to resources within the device's logic array. This, said Dorsey, bypasses the usual parallel and serial I/O circuits. And each dice is subjected to additional processing steps in order to fabricate microbumps that allow it to be attached to the silicon substrate.
FPGA dice are connected using the passive silicon interposer. Unlike the dice, the interposer is fabricated on a 65nm process, with four layers of metal enabling the interconnections.
Coarse pitch TSVs within the interposer lead to so called C4 bumps – controlled collapse chip connections – which link the fgpa dies with the package substrate.
Peckham said the interposer could be built using a 65nm process because there's no need for the high density or performance of the 28nm process. "And, being standard cmos, there aren't thermal expansion issues. Neither are there thermal flux issues with the dies stacked on the interposer."
Xilinx believes its stacked silicon interconnect approach is capable of supporting die to die bandwidths in excess of 1Tbit/s; enough, it says, for most complex designs. The technology is being used in the Virtex-7 family to offer up to 2million logic cells, 65Mbit of block ram, 1200 SelectIO pins and 72 serial transceivers. The latter features enable LVDS parallel interfaces at 1.6Gbit/s and a bidirectional bandwidth of 1.886Gbit/s.
Because the device can be treated as a single die, development is said to be easier. "You can regard it as a single design project," Peckham claimed. This is said to make timing closure less problematic and allows the device to be programmed as a single fpga, with single bring up.
The technology has been proven using three test vehicles. The first test vehicle was created late in 2008 using a 90nm process. This allowed a number of functions to be validated. This was followed by 40nm and 28nm test vehicles in 2009.
Peckham said lead customers were already working with the beta version of Xilinx' ISE13.1 design suite to produce devices with 2m logic cells. "We will be delivering engineering samples in mid 2011," Peckham concluded.
Graham Pitcher is group editor at Findlay Media