How are competitors differentiating Cortex-M3 based mcus?
4 mins read
When ARM launched the Cortex-M3 core in 2004, it opened new opportunities for microcontroller developers. Previously, these companies needed to create and maintain their own mcu core; now, that work would be done by ARM, leaving them free to differentiate their products.
The approach has worked well. Since the launch by Luminary Micro of the Stellaris mcu in 2006 – said to be the first 32bit mcu with a $1 price tag – the market has developed to encompass devices from the likes of Energy Micro, NXP, STMicroelectronics, Texas Instruments and Toshiba.
How the various licensees differentiate what would appear to be broadly similar products has recently been demonstrated by the latest announcements from NXP and ST; the LPC1800 and the STM32F2. But while both companies offer variants with a familiar range of peripherals, both have implemented innovative systems to complement the Cortex-M3 core.
NXP and ST have both moved their Cortex-M3 based mcus to 90nm embedded flash processes. Geoff Lees, general manager of NXP's microcontroller business line, noted: "Although NXP has had a 90nm process available since 2006, we have had to bring leakage current down before it could be used for mcus." And NXP has transferred its two transistor flash memory to the 90nm process. "This has brought further improvements," Lees noted.
Both companies have identified flash memory and data handling as one of the most important design parameters. By moving from a 140nm process to 90nm, NXP has been able to double memory datawidth from 128bit to 256bit. "This means users can get more performance from flash memory and that's achieved without a penalty in density or die size," Lees asserted.
Alexander Czajor, ST's microcontroller marketing manager, EMEA, underlined the importance of wide flash memory. "If you have flash and it's slow, you have to add wait states because you can't fetch data at the speed it's wanted. If you have 128bit wide flash, you can load four to eight instructions at a time." But Czajor added this approach only works if the code is linear. "If it's not, you have to jump, get code and then wait."
ST's solution to this is ART: the adaptive realtime memory accelerator (see fig 1).
This device has 64 128bit registers for code and a further eight 128bit registers for data. Czajor explained the approach. "If you have branches in the code, the instruction has to be fetched from memory the first time it's needed. After that, the accelerator will become the branch target and, in most cases, the instruction will be in the matrix and will be accessible with zero wait states."
Claiming users will get a performance increase from this approach, Czajor added that performance could be pushed without ART, but at a cost. "There would be the need for a bigger flash memory, which would draw more current and create more emi. The most economic solution is to make the flash smaller or slower, but with the memory accelerator. By integrating the memory accelerator with slower flash, you can still get the full performance of the core up to 120MHz." And because ART reduces the number of flash accesses, less power is consumed.
ST says competing Cortex-M3 mcus can now only outperform the STM32 by pushing the clock to more than 120MHz, which will increase power consumption and heat dissipation.
NXP has implemented up to 1Mbyte of on chip flash in two banks (see fig 2) for the same reason: power.
"Smaller blocks are faster," Lees pointed out, "and use less power. These banks can be used in contiguous or 'front and back' modes. In contiguous mode, the two banks operate as if they were one bank. 'Front and back' mode, however, allows so called 'golden copy' preservation. Each bank can be reset to zero while the other is executing," Lees said, "providing protection against programming mishaps." He also noted the 2T flash enables low voltage tunnelling, which provides a read/write endurance of around 100k operations.
Neither NXP nor ST is looking to offer more than 1Mbyte of flash on chip – costs get too high otherwise – so arrangements need to be made to get data from off chip memory.
Lees pointed out that many applications developers are now creating interfaces that require full xga resolution, which needs more than 1Mbyte of data.
NXP has developed a quad spi link to handle that. Lees said: "It's an extension to the standard spi bus. After handshake pins, there are four data lanes, each running at 80MHz, with a bandwidth of 40Mbyte/s. It's a rational approach for those specifying parts without on chip flash." The approach also allows developers to use execute in place approaches. "You can memory map peripherals," Lees explained, "and you don't need to write comments to move data." In fact, the quad spi link can address more than 16Mbyte of external memory.
Moving data around on chip is also important and both mcus make full use of the AHB bus. Czajor noted the STM32F2 has a seven layer AHB bus matrix which interconnects all masters and slaves, ensuring seamless and efficient operation; even when several high speed peripherals work simultaneously. "You can have code reading from flash while dumping data to sram," he said. "By establishing links in parallel, it speeds overall performance."
Lees noted there are eight AHB bus masters in the LPC1800. "Each can have part of the ram," he said, "and all are independent, which means no latency."
NXP has also included a configurable timer system in the LPC1800 (see fig 3).
This comprises of a timer array with a state machine, enabling complex functionality including event controlled PWM waveform generation, a/d converter synchronisation and dead time control. It allows designers to create user defined waveforms and control signals for applications such as power conversion, lighting and motor control.
According to the company, this introduces the concept of states – keeping track of where the timer is in a cycle – and events – working with state to change a timer's output. Lees said NXP has been looking for a way to implement fpga style functionality in an mcu, but has yet to find a way to do it at low cost in a high volume product. "We've been looking to provide users with more configurability for users when they need it. For the moment, we are offering the state configurable timer subsystem. Users can sync the timer array to an external application, rather than to the cpu."
Czajor said the STM32F2 will allow users to take advantage of the full power of the Cortex-M3 core at 120MHz, while drawing minimal power. ST claims 228.6CoreMarks at 120MHz, with a power consumption of 188µA/MHz equating to 22.5mA at 120MHz. "The maximum theoretical performance of the Cortex is available at 120MHz," he asserted. "This is the highest 'coremark/MHz' figure amongst all Cortex-M3 based micros listed on www.coremark.org," he concluded.