Designed for speed, NVMe has been about raising performance benchmarks right from day one. By focusing on parallelism, it has been able to unleash the full potential of today’s top enterprise and consumer SSDs. However, the embedded and industrial markets have very different goals from what most enterprise and consumer users expect. Instead of speed, embedded and industrial application deployments tend to focus on issues like reliability, power consumption and form factor. There is debate as to whether recent improvements in the NVMe spec will let it compete in the embedded arena, or if engineers dealing with such applications would be better off sticking with the tried-and-tested SATA protocol.
Understanding NVMe
Released in 2012, NVMe is a storage protocol designed for flash drives running over a PCI Express (PCIe) electrical bus. While legacy protocols such as SATA were conceived of in the era of rotating disks, NVMe was designed from the start to unleash the full potential of flash storage. To do this, it takes advantage of the parallelism that is inherent in today’s computing systems, as well as the random-access nature of flash storage. Instead of SATA’s single command queue which lines up 32 commands, NVMe supports up to 64,000 queues, each with up to 64,000 commands.
Since it runs over the PCIe bus, NVMe is a fully scalable interface which can support very high maximum speeds. While SATA III tops out at 600MB/s, NVMe’s bus speed is determined by the number of PCIe lanes supported. With 1GB/s data rate lanes supported by PCIe Generation 3, the throughput can consequently be multiples of this. As a protocol running directly on the PCIe bus, NVMe also has no need for an IO controller or host bus adapter, like SATA or SCSI mandate. This reduces both latency and overall system power consumption.
Unlike its main competitor, SATA which defines a connector, a bus and a logical protocol (AHCI), NVMe simply describes a logical protocol layer which runs over the PCIe electrical bus. NVMe devices can come in a variety of form factors, and be connected through a PCIe, M.2, or U.2 physical connector.
Compared to SATA, NVMe not only has faster speeds, but operates more efficiently in general. Whereas AHCI requires four un-cacheable register reads to issue a command, NVMe doesn’t need any - thereby resulting in lower latency once again being witnessed. It also uses a streamlined command set which needs less than half the number of CPU clock cycles to process an IO request that SATA does. Small random IO operations can also be performed more efficiently. A 4kB read request takes two instructions with SATA, yet just one instruction is needed with NVMe.
Embedded Implementation of NVMe
While the benefits of an NVMe for high end enterprise and consumer storage are clear, what is less certain at first glance is whether NVMe is appropriate for implementation in modern embedded applications. After all, SATA has been successfully employed for many years now and if it works, then why fix it?
There is more to NVMe, however, than just speed. While NVMe was initially designed with the needs of enterprise customers specifically in mind, the intrinsic efficiencies that are present within the protocol offer tangible benefits to embedded engineers.
Power Considerations
Low power embedded applications are a huge market. Whether it is for IoT devices, Bluetooth beacons, smartphones, or wearables, these are generally battery-driven devices where the power budget really matters. In this market, speed is rarely the key driver. Instead, ensuring high reliability, compact formats and above all low power consumption are the main objectives.
NVMe has advanced error reporting and management capabilities, including end-to-end data protection. This end-to-end data protection uses metadata tags to ensure that data written to the drive and data read from the drive to the system are correct, which is useful for applications where data integrity is mission critical.
While NVMe does not require an adherence to a particular size and shape, it is supported by the M.2 form factor, which is one of the smallest and most densely packed SSD form factors on the market. The M.2 standard allows for module widths from 12-30mm and lengths of 16-110mm. While consumer based M.2 SSDs usually have to rely on longer form factors to accommodate larger capacities, M.2 allows embedded systems which don’t need such large capacities and can thereby utilise extremely small SSDs. Besides M.2, NVMe SSDs can also come in BGA packages, which allows them to compete head on with eMMC SSDs.
NVMe technology also enables very low power consumption. In terms of processing efficiency, its IO operations (as already mentioned) take less CPU cycles to execute than SATA IO operations, because of a more streamlined command set. Since un-cached register reads are eliminated and write operations need one register read at most, small random IO operations can be extremely efficient as well. NVMe’s streamlined protocol and fast performance allows it to be more energy efficient when the drive is active. Its support of PCIe power management features also make it efficient even when the drive is idle.
The PCIe link can consume significant power, up to 50mW, even when in its traditional L1 idle state, so advanced power management modes were introduced lower idle power even further. The L1.1 sub-state reduces power usage while maintaining common mode voltage and the L1.2 sub-state turns off high-speed circuits. By supporting these two modes, NVMe SSDs can achieve idle power consumption as low as 2.5mW, 50% less than the corresponding DevSLP idle mode of most SATA SSDs.
In addition, while SATA DevSLP relies on a signal sent by the operating system to enter power saving mode, NVMe drives can use autonomous power state transitions, which are programmed into the drive controller. This allows for the drive to quickly and autonomously enter and exit power saving mode at the hardware level, maximising the time spent idle and minimising drive wake-up latencies.
While NVMe is not a strong competitor to eMMC flash in terms of low power, low price tag embedded devices, its combination of streamlined stack, low power consumption and small form factor mean that it makes good sense as an mSATA replacement in the next generation of battery-driven embedded devices.
NVMe for Mobile Devices and Edge Computing
Besides low power devices, mobile computing and edge computing are growing sectors within the embedded space. These applications include smartphones, tablets, laptops as well as routers and gateway devices designed to perform data processing and analytics at the network edge. Mobile devices are battery powered and have restricted energy reserves, but storage capacities are significant and storage performance matters. For these devices, NVMe is a very good fit thanks to its performance, small form factor and low power consumption attributes, plus its straightforward integration without requiring a host storage controller chip.
NVMe 1.2 has also introduced the host memory buffer (HMB) feature. This allows NVMe SSDs to use a portion of the host system’s DRAM to replace the DRAM usually built into the SSD controller. The NVMe SSD can thus be smaller, significantly cheaper, and more energy efficient, while still achieving fast storage performance.
Besides mobile computing, edge computing is another area where storage capacity and performance are paramount. Covering autonomous vehicles, drones, routers and gateway devices which perform data processing, edge computing is a recognition that not everything can, or should, be done in the cloud. In a highly automated factory, an IoT gateway device may send production data to the cloud for big data analytics, but may also perform some basic analytics on that data which can provide real-time feedback to improve manufacturing efficiency. Waiting for the cloud in this case, may take too long for the processed data to in fact be useful. These gateway devices may not have the same level of performance requirements as enterprise servers, but latency and bandwidth are still extremely important for them so that real-time processing of data can be achieved - making NVMe is an obvious fit.
Figure 1: Intel’s 600p Series SSD module with NMVe interfaces
NVMe for Embedded Storage
The embedded storage market is large and diverse, with a range of needs from very low power consumption, to performance rivalling that of desktop or even server computers. While the focus of NVMe has been mostly on elevated input/output operations per second (IOPS) for data centres and suchlike, the efficiencies of this new protocol are making it an increasingly attractive choice for embedded storage applications.
For low power embedded devices, NVMe can be made to fit the smallest package formats. Its lightweight software stack and direct interface into the PCIe bus mean that it is fast, efficient and easy to implement. The ability to support PCIe low power states also helps it to keep power consumption to a minimum. While NVMe will probably never replace eMMC, its benefits make it a good candidate for applications where mSATA is currently being used.
Thanks to its PCIe Gen3 x4 NVMe interface resource the 600p Series 3D NAND SSD module from Intel is up to 17 times faster than a HDD and 3 times faster than conventional SATA-based SSDs. It is able to lower power consumption by more than 90% compared to a HDD and thus significantly extend the battery life supported. At the Flash Memory Summit 2017, held earlier this summer, Swissbit unveiled its N-10. This is an NVMe PCIe M.2 SSD module prototype with a 2 lane, 4 channel arrangement that is targeted directly at power and space constrained systems designs within the embedded space. It will be able to deliver double the performance of an SSD with SATA 6Gb/s interface capabilities, while also reducing power consumption significantly.