DRAM refresher: Problems the technology is set to encounter
9 mins read
Ten years after being asked to pen a short article on trends for the magazine Electronics in 1965, Intel's Gordon Moore was invited to the International Electron Device Meeting (IEDM) to provide an update. His expansion of the original article, in effect, became the core of 'Moore's Law', as circuit design researcher Carver Mead later tagged it.
There were three components to Moore's prediction of a doubling in chip capacity every two years: advances in processes; improved circuit layout; and die size. In 1975, he told IEDM delegates that, based on historical growth in die size, they could expect to see, within five years, devices measuring about 58mm^2. Built on wafers measuring 4in across or less, devices of the time were tiny by today's standards.
At the time, it looked as though memories would be the pace-setters in die size as well as process advances. But the memories Moore had in mind were charge-coupled devices; still experimental at the time, they seemed to have the features needed to make low-cost, high-density memories.
But their data values were easily corrupted and, in comparison to other technologies, were hard to use as data-storage devices. Ironically, and despite its own problems, the dynamic random-access memory (DRAM) developed by Intel in 1970 achieved dominance and became the device that, for 25 years, became the primary driver in semiconductor manufacturing in everything but die size.
The semiconductor industry did a little better than Moore expected in those five years, but microprocessors proved to be better at driving die size than DRAMs. As manufacturing yields improved, DRAM makers translated this improved efficiency into somewhat larger dice. However, better yields have not hidden the increased cost of the silicon used.
The limitations on DRAM die size have continued, leading in the past decade, to a marked slowdown in the increase of DRAM capacity at the chip level. Process and circuit scaling have not been enough to compensate and provide the cost-effective doubling in capacity every two years that Moore projected.
The die of a 256Mbit DRAM in 1998 was more than four times bigger than the 256kbit die of 1982. Not only that, the 1998 device was more than seven times more expensive, even after adjusting for inflation.
Although chipmakers pumped out research devices that broke the 1Gbit barrier before the end of the 1990s, they were too big to be produced economically; memory makers had to wait for improvements in transistor density before they could introduce the first commercial 1Gbit devices. These did not appear until 2003 and, in an indication of how die size stopped scaling, were smaller than the first 256Mbit chips.
With a die size of some 170mm^2, they were still costly and out of the comfortable range that memory makers like to stay in. Modern memories don't tend to become high volume products until die size gets closer to 100mm^2 and the sweet spot for volume is less than 60mm^2 (see figs 1 and 2).
So, 10 years after die shots were shown at a conference, the first production 4Gbit DRAMs have yet to supplant less dense memories in volume applications. If the market had driven a massive increase in DRAM capacity, vendors might have worked harder to increase density, but that has not happened. During the 1990s, DRAM capacity effectively lagged the market demand for increased PC memory capacity driven by operating systems such as Windows 3.0 and its successors.
But, once it became possible to support these predominantly 32bit environments with 4Gbyte of memory using no more than four modules, demand for rapid bit growth slackened. Flash has yet to suffer in this way.
In principle, as the number of chips needed to implement sufficient data and program storage for a system falls, DRAM should gradually move on-chip. But that highlights another weakness of DRAM; in order to deliver the capacities it has achieved, DRAM makes demands on process technology that make it difficult to mix with regular CMOS.
The core of the DRAM is a capacitor attached to the drain of an access transistor (see fig 3). To store a logic '1', the capacitor needs to be charged. When the cell is read, the capacitor is discharged through the control transistor and into a sense amplifier designed to distinguish the presence or absence of a tiny charge on a highly capacitive bit line. Because it needs to be discharged, the read is destructive: the memory controller needs to rewrite the value back into the cell once the read has been performed.
Simply leaving the capacitor alone is equally destructive. The DRAM capacitor is very leaky, so needs frequent refreshing – the cell is read and its contents rewritten frequently in operations hidden from the main memory controller – except when the read or write is to the same bank as that being refreshed at the time. This puts a limit on how small and leaky the capacitor can be; too frequent refresh cycles will limit access speeds.
Having capacitors and transistors side by side occupies more space than is desirable so, as process technology improved, designers found ways to slide one under the other. One approach was to layer capacitors on top of the control transistors. Early versions of this approach had the dielectric material spread between the bitline and wordline control transistors.
As process dimensions shrank, the stacked capacitor evolved into a tall cylindrical structure extending above the active devices and into the metal interconnect stack. But this has very poor compatibility with CMOS – the high temperatures needed to form the capacitors wreck the performance of transistors laid down in earlier steps.
A second design pushed the capacitor down into the wafer's substrate so it could be formed before the CMOS transistors in a SoC. With this technique, deep trenches are etched in the silicon and then filled with dielectric; an approach that is more compatible with standard processes. Even so, embedded DRAM remains a minority on-chip memory technology, even though it is far denser than SRAM – which today calls for six transistors per cell at least – because of the extra process steps needed.
In principle, the 1T-SRAM offers the best of both worlds. It offers the density of DRAM with few of the expensive process changes needed by embedded trench technologies. The capacitor in this case is the gate on a more or less standard transistor – the memory makes use of a normally undesired parasitic capacitance.
As the capacitance that can be stored on a transistor gate is even lower than that which can be stored in a dedicated, if tiny, capacitor, good noise tolerance is essential. It would be easy for the capacitance of a long bitline to overwhelm the tiny reading from an individual cell. Also, the smaller capacitance requires a more frequent refresh, which increases the probability of interfering with memory accesses by the host processor. The way that 1T-SRAM designers have worked around this is to divide the array into smaller banks.
An alternative is the 1T-DRAM developed by Innovative Silicon (ISi), which uses the normally undesired capacitance that accumulates in the channels of transistors built on silicon-on-insulator (SOI) wafers. Initially, ISi promoted this technology to microprocessor and SOI makers as an alternative to 1T-SRAM and embedded DRAM but, in the past couple of years, switched its focus entirely to commodity DRAM, selling a licence to Hynix.
DRAM makers are reticent to move to SOI because of the higher wafer cost, but ISi developed a finFET-like structure that could be implemented more cost effectively on standard bulk silicon wafers. The company believes its technology will allow further scaling of DRAMs beyond what is possible with standard trench or stacked-capacitor technologies, which are hitting problems similar to those being encountered by flash.
The problem for both trench and stacked-cell DRAMs is that aspect ratios are reaching the point where the cylinders collapse too easily during manufacture. However, advances in high dielectric-constant memories may stave off the inevitable.
While the internal layout of DRAM has changed, so too have the external interfaces. In the late 1990s, Rambus battled with mainstream memory makers to define the future of the PC memory interface. Then, its RDRAM interface promised higher datarates than the synchronous memory bus interface that had become the norm. Originally, DRAMs used an asynchronous interface. While this kept down the cost of DRAM dice, it was tricky to work with from a system-design perspective: designers had to deal with column and row access timing requirements and get them in the right order.
The SDRAM interface synchronised transactions to a common clock and became more of a command interface, so memories could support pipelined accesses in which the controller could issue a series of requests and then collect the responses in sequence when the data was ready. Somewhat confusingly, the databooks kept the 'strobe' pin names for column and row addresses – CAS and RAS, respectively – even though the pins were simply used to pass command bits and not act as strobes to indicate when data on the bus was considered valid by the controller driving it.
Once the command interface was in place, it became possible to increase massively the amount of data that could be transferred on each clock edge. The concept of double datarate (DDR) was simple: transmit different values on the rising and falling edges of a clock instead of holding a constant value during an entire clock cycle (see fig 4). Memory makers found they could match RDRAM, which itself used a DDR scheme. However, in other respects, the change was incremental and did not demand the big change in system architecture that RDRAM did, although Rambus used its patents to extract royalties from those who went down the DDR route.
DDR memories were no longer distinguished by access time, but by the type of bus they supported. An 8byte-wide dual inline memory module – the edge connectors carried different signals on either side of the PCB to allow the bus to expand to 64bit – could transfer 1.6Gbyte/s on a 100MHz bus. So the memory supporting that became known as PC1600.
DDR2 doubled the command rate to the memory by double-clocking those signals. At the same time, the standard called for lower voltages and lower internal clock speeds to save power. As a consequence, in terms of clock cycles, the effective latency of memory accesses increased, driving an increase in the size and number of levels of hierarchy in the caches of the processors that used them.
Today, DDR3 is the predominant memory interface, dropping the voltage to 1.5V from DDR2's 1.8V. It continues the trend set by DDR2 in that the bus clock is again doubled relative to the memory array's internal clock. Despite the drop in voltage, memory-bus power has gradually increased.
Power consumption has increased so much that, for gamers who like to overclock their PCs, memory-module maker Kingston Technology launched a family of heatspreaders fitted with pipes for water cooling shaped to fit around standard DIMMs. It was a sign that things have to change.
As most of the energy is consumed by the bus drivers, rather than the memory core, attention has focused on ways to reduce interconnect power consumption. As with the introduction of DDR, Rambus is pitching an interface design against emerging Jedec standards. And the IP company is not alone in trying to provide the basis for the next generation of low-power memories.
DDR4 should see a renewed push for reduced power consumption, although some options have proved too expensive for system designers. Ideally, the idea of a memory bus should go away – as it has done for the graphics memory architecture – in favour of a point-to-point protocol. This would avoid the need to use power-hungry termination networks. Although the final Jedec specification has yet to appear, recent announcements by vendors such as Samsung indicate that point-to-point has been rejected.
In practice, modules are likely to use load reduction, as employed by servers with large memory demands – buffers behind each module pin reduce the capacitive load seen by the system. Unbuffered DIMMs may become a thing of the past. Memory controllers will become even more complicated as the DDR4 specification will allow deeper pipelining of read and write commands that will only yield better performance if the memory devices can switch rapidly between internal banks.
Bigger changes are in store for DDR4 in mobile systems, where the power issue is far more pressing.
The JEDEC 42.6 committee has set pretty low power per bit ratios for the next generation of memory modules for portable designs. And that means a big change in the way memories are implemented.
The interface is expected to hit 12.8Gbyte/s with a power budget of less than 1W. The only way to achieve that is to have a lot of connections – a radical departure from the past, when DRAM package cost was one of the key factors. However, this interface is not likely to be used between packaged devices, but within them.
Even package-on-package implementations cannot get to the interconnect density needed to achieve close to 13Gbyte/s at a transfer clock rate of just 200MHz. This will need more than 1000 interconnects, in turn requiring 3D package assembly techniques: either using through silicon vias (TSVs) between stacked dice or silicon interposers between flip-chip bonded devices. As a result, the Wide I/O Mobile Memory standard will be one of the first to embrace 3D architectures from the outset.
So that a die from one vendor can be stacked on the memory controller of another, standardisation of dimensions will be critical. Samsung has pressed ahead with Wide I/O prototypes and Qualcomm has indicated its interest. But there are major issues about how to test the dice before they are stacked. The first Wide I/O memories may be standalone products mounted on the processor using conventional package-on-package techniques. The large solder bumps on these packages limit connection density, but the advantage is that all I/O pins go to what looks like one memory device – providing scope for a wider interconnect.
As the 3D technology progresses, it is likely to move into larger, mains-powered systems as their designers focus more on lowering power consumption. Energy will become the dominating factor in DRAM, with the result that systems will use multiple layers of the memory as caches. Overall per-die memory density will be less important than how the interfaces work and 'More than Moore' techniques will become more important than Moore's Law.