Porting designs to the 32bit world without adding cost
7 mins read
An 8 or 16bit CPU may be ideal for your application at present. However, to stay competitive, you need to differentiate your product with continuous enhancements, including new features, faster speeds, improving product specifications, and reducing cost. If you don't provide these, your competitors will.
One way to maintain your competitive edge is by incrementally improving your existing design. Over time, architectural limitations may make this process increasingly slow and expensive. Alternatively, you can port your design to a 32-bit platform. This can improve your product in several ways (Table 1).
Do you really need to port your design?
When porting from an 8bit CPU to a 32bit CPU, there are some considerations to keep in mind. One of the first is whether your existing CPU is still viable and if there really is a compelling need that can be met or advantage that can be leveraged by moving to a 32bit CPU.
8bit applications are usually basic sensing and control systems with simple calculations. 8bit CPUs often do well at bit-level operations and applications where the values involved are less than 256. A well known architecture is the 8051.
Even the smallest 32bit CPUs can do everything that 8bit CPUs can do, and more, as Figure 1 shows:
• More complex calculations. Examples include native-mode DSP, image processing, and gesture recognition
• Data mining and analysis, and database lookup
• Multitasking through a real-time operating system (RTOS)
Figure 1: MCU performance comparison using the Dhrystone Benchmark [3]
Even if you do not require any of these advanced features, 32bit CPUs can improve your design in the following ways:
Power: Consider a common low-power design where the CPU sleeps in a low-power mode and periodically wakes up to execute code in active mode (Figure 2). 32-bit CPUs may require more power than 8-bit CPUs in both modes, but they take less time to execute the code. As a result, the 32-bit CPU spends more time in the low-power mode. In many cases, this can result in advantageously reducing the average power.
Figure 2: Average power consumption comparison for computational intensive tasks
Scalability: Today, most CPUs are marketed as a family of similar devices scaled from low- to high-performance. If your product needs to be scalable, then it makes sense your CPU should be scalable too. CPU scalability is usually defined in terms of:
• Instruction set. Higher-end family members should have more instructions or more modes of operation for existing instructions, while maintaining backward compatibility with lower-end instructions.
• Additional registers, or more bit definitions in existing registers
• Additional functions, for example interrupt control and debug
The ARM Cortex-M processor family is a good example of CPU scalability, as Figure 3shows.
Figure 3: ARM Cortex-M processor family overview
Cost: One perceived barrier to porting to 32 bits has been increased cost. With recent advances in technology, however, it is no longer necessarily the case that 32-bit devices are more expensive than 8-bit devices. A number of low-cost 32-bit devices are becoming available. For example, because of its simple design and small silicon area the ARM Cortex-M0 CPU is particularly cost-effective. One example of an MCU built around the Cortex-M0 is Cypress Semiconductor's entry-level PSoC 4000, which is as low as $0.29 in quantity.
In addition, Table 1 shows that the support for high code density and faster execution that 32-bit CPUs offer can help to lower cost.
It's not just about the CPU
It is common to focus just on porting your firmware code to the new CPU. However, remember that the CPU comes as part of an MCU device, and the MCU may offer as many opportunities as its CPU for meeting customer demands for improvements. For example:
• Does the MCU have peripheral hardware features that will enable product feature improvements?
• Can the peripherals operate using less code and put less load on the CPU? This may result in the system using less memory, possibly reducing cost.
• Can the device help you reduce board-level or system-level cost? For example, can you move certain functions off the PCB into the MCU?
• Is the MCU flexible enough to let you adapt to changing requirements without having to lay out a new PCB?
Finally, note that an MCU device is often only as good as the integrated development environment (IDE) that supports it. Confirm that the new IDE is more than just an editor, compiler, and debugger. IDEs that enable you to quickly construct an entire application using all of the MCU hardware features as well as the firmware can significantly speed design. Ample development kit and application note support can also help.
Code porting tips
If you decide to port a design to a 32-bit CPU, keep these considerations in mind:
Select an entry-level 32-Bit CPU/MCU and IDE. For your first port into the 32-bit world, keep it simple as this will reduce the risk of introducing defects as you become familiar with the differences in 32-bit design. Select a basic entry-level device, as well as an IDE that can simplify the porting process. One example is Cypress Semiconductor's PSoC 4000 MCU, supported by the PSoC Creator IDE.
Select a new compiler. When you port your code to a new CPU, you may also have to choose a new compiler. A number of compilers, some of which are free, are available for 32-bit CPUs. Examples include GCC, ARM/Keil MDK, and IAR.
Get your build and debug tools working. Create a small test program, for example to blink an LED. You will gain experience with the new tools that will help you with the remaining steps.
Rewrite assembler code. Ideally, your existing code should be in C (or some other higher-level language). Any of your code that is in the assembly language of your 8-bit processor is probably not portable. If you have any assembler code in your current design, consider rewriting it in C before beginning the porting process.
Encapsulate MCU-specific code. If your code is modular (a coding best practice), you may have already done this. The portion of your code that directly interacts with MCU registers, such as to read I/O ports, should be in files separate from the rest of the code. Encapsulate the code in those files in functions with generic names, such asUART_Receive(). Then you can rewrite those functions for the new MCU without having to change the rest of your code.
Other architecture changes
A new MCU may allow you to offload functions from the CPU to peripherals. Also, a new IDE may auto-generate code for you. To take advantage of these features, consider re-architecting some or all of your code.
Because it is easier to implement task switching in 32-bit CPUs, consider re-architecting your code as a set of separate tasks to be used with a real-time operating system (RTOS). Example RTOS vendors for 32-bit systems include Segger and Micrium.
Incremental build and debug
When designing new code, a coding best practice is to add, test, and debug code in small increments. This makes it easier to find and fix defects. The same is true for porting – port, test, and debug code on the new MCU in small increments.
Example CPU and MCU
To get a better understanding of the porting process, let us examine the process in the context of the Cortex-M0 and the PSoC 4000 in more detail. The ARM Cortex-M0 processor is the smallest ARM core available, and a natural and cost-effective migration path from 8-bit and 16-bit CPUs. Its register architecture (Figure 4) and instruction set make it an effective C engine.
Figure 4: Cortex-M0 register architecture
All registers are 32-bit, which enables 32-bit addressing and a 4-GByte address space. Most 8-bit CPUs are limited to a 64-Kbyte address space.
There are 12 general-purpose registers. (Low registers R0 – R7 have more support in the instruction set.) Special registers include:
• dual stack pointers (R13) to help implement a real-time operating system (RTOS)
• link register (R14) for fast return from function calls
• program counter (R15)
• program status register (PSR) contains instruction results such as zero and carry flags as well as the current exception number
• interrupt mask register
• control register controls which stack pointer is active
The Cortex-M0 core instruction set is simple but powerful, with a large number of addressing modes. It enables excellent code density [2]. C code ported from an 8 bit CPU to a Cortex-M CPU frequently uses less memory.
The ARM Cortex-M series CPUs have an instruction pipeline, as Figure 5 shows. This increases overall code execution speed because the CPU can execute one instruction while simultaneously fetching and decoding subsequent instructions.
Figure 5: Pipeline stages in the Cortex-M Processor (Source: ARM)
The ARM Cortex-M CPU series integrates support for interrupts directly into the CPU core, using a nested vectored interrupt controller (NVIC). NVIC features include:
• Dynamic priorities and automatically prioritized nesting of pending interrupts
• Low latency – the CPU automatically stores and restores its state with no instruction overhead
• Tail-chaining – back-to-back processing of nested interrupts without the overhead of state saving and restoration between interrupts
• Late arrival – a higher priority interrupt that arrives during the stack push operation of a lower priority interrupt is serviced first.
These features enable faster and determinate interrupt handling. A system timer "SysTick", which facilitates RTOS usage and can operate during CPU sleep, is also included. With the high level of interrupt support available, you can consider changing your architecture to be more interrupt-based.
ARM's Cortex-M processor series integrates debug features directly into the CPU core, which enables better debug support across a number of IDEs.
The Cortex-M0 core is part of a larger family of Cortex-M processors that all have the same register architecture and execute some or all of Thumb-2 instruction set. This makes it easier to upgrade to a more powerful CPU such as the Cortex-M3 processor in Cypress's PSoC 5LP.
The PSoC 4000 is the entry-level member of the PSoC 4 family. In addition to the Cortex-M0 processor, it features a set of flexible and dynamically configurable peripherals, as Figure 6 shows.
Figure 6: PSoC 4000 block diagram
This CPU also features capacitive touch sensing. Capacitive sensing touch offers significant advantages over mechanical buttons in terms of cost, performance, and ESD protection. CapSense features include:
• Easy to implement buttons, sliders, and proximity sensing solutions, with up to 16 inputs routable to various I/O pins
• High signal-to-noise ratio (SNR) ensures touch accuracy in noisy environments
• Robust water tolerance for severe environments
• SmartSense Auto-Tuning speeds time-to-market and eliminates the need for calibration
The CapSense block includes two DACs and a comparator, which you can use for other purposes if CapSense is not required.
Cypress also offers PSoC Creator, an integrated design environment (IDE) for the PSoC 3, 4, and 5LP devices. PSoC Creator is a free Windows-based IDE which enables concurrent hardware and firmware design of PSoC-based systems.
You can design using classic, familiar schematic capture supported by over 100 pre-verified, production-ready PSoC Components. The Components include auto-generated API code, which can significantly reduce the amount of code that you have to write. Using PSoC Creator it is easy to port designs between PSoC families, at both the configurable hardware level and the firmware level, as Figure 7 shows.
You can also export PSoC Creator designs to other IDEs such as µVision and IAR.
Click on image to enlarge.
Figure 7: Component configuration with PSoC Creator
It is now possible to upgrade legacy 8-bit and 16-bit designs to 32 bits, and still meet cost targets. Several considerations must be kept in mind when planning a port to a new CPU; one of them is to select an entry-level 32-bit MCU and an IDE that supports it well.
References:
1. Cypress Semiconductor's application note AN89610 on how to create optimized C code using the GCC or MDK compiler.
2. ARM microcontroller code size white paper.
3. Dhrystone is a computing benchmark program used to calculate the relative performance of an MCU. (DMIPS = Dhrystone million instructions per second.) Data referenced from The Definitive Guide to the ARM Cortex-M0, ISBN: 978-0-12-385477-3.