Originally a spin-out from Bristol University, XMOS has successfully established itself as a fabless semiconductor company that develops the chips and embedded software that address not only the needs of the audio market but, increasingly, enables voice interaction across a wide number of products.
Today, XMOS is heavily focused on the fast developing audio space - around 80 percent of its business is derived from audio - and it has successfully positioned itself to take advantage of a market that has not only undergone profound change in recent years but, with the rise of voice as the key interface in a growing number of devices, is expected to see even stronger growth in the coming decade.
The acquisition of Boston-based Setem Technologies, a pioneer in Advanced Blind Source Signal Separation technology, means that XMOS is now in a position to develop consumer devices capable of focusing on a specific voice or conversation within a crowded audio environment. It’s a technology that will help XMOS to optimise its speech-recognition systems and has been described as a game changer for the company.
At the head of XMOS is CEO Mark Lippett who, as employee ‘number 6’, has been with the company from almost the very beginning.
“Originally, we were set up to address what was seen at the time as a major problem for embedded software engineers in the consumer space. They were simply not being supported in their efforts to differentiate their products,” he explains.
“Cost has always been a crucial factor in this segment and the reference designs that were available were often inflexible and constraining, robbing design engineers of the opportunity to innovate. The only options open to them were to develop an expensive ASIC, while the FPGA community were focused elsewhere and they were too expensive,” he explains.
In response, XMOS developed and went to market with a microcontroller architecture that enabled engineers to continue to develop C based solutions but which also enabled them to create differentiated products, as they could now write their own IO protocols in software and change the nature of the hardware. At the same time they retained the ability to write control and DSP software.
“We took that to the market and almost immediately were drawn to the audio space,” Lippett explains.
Fate also played an important part at this point in the company’s development. The launch of this first-generation architecture coincided with Apple’s decision to pull FireWire and its ‘suggestion’ that the ecosystem it supported should switch to a USB audio class, in order to provide an interface for audio peripherals.
That, in retrospect, was a critical decision that helped support XMOS’s platform and encouraged its move into the audio market.
“Apple’s decision provided a serious discontinuity in the audio market at the time,” explains Lippett. ”Our platform made it possible to develop solutions far more quickly and it proved such a success that Apple started to recommend XMOS as a partner.”
According to Lippett, “While we were delivering a high-class audio solution, built on a general-purpose platform, we soon recognised that we could also take our core architecture and add DSP features to it, while retaining our IO capabilities.
“By adding those heavy lifting DSP capabilities, we started to gain much greater access to the USB audio market and, as a company, we were able to position ourselves to not only address the audio playback space but, crucially, start to enter the fast-emerging voice market.
“The exciting thing about the audio space is that although it’s glamorous, it’s also very demanding. It gave us the opportunity to refine and hone our skills.”
Through 2013-14 XMOS noticed that customers were looking increasingly at far-field microphone technology and using the company’s architecture to deliver products.
“At the time we were experimenting with AI and thinking about how we might deploy it alongside our existing hardware and DSPs at the edge,” Lippett explains.
“The voice market was beginning to emerge, and we knew it had the potential to be huge. Today, it accounts for between 10-20 percent of our business.”
Lippett, however, suggests that we should draw a sharp distinction between the market as it is and the hype that surrounds it. The voice market still needs to develop, “it’s not in its fully developed form yet, but it will be the communication mechanism of choice.
“Audio had tended to be overlooked and was seen by many as a comparatively simple problem to solve. That has changed as audio has become a more important element in the consumer experience. The arrival of the voice market has been a game changer.”
Target markets
Like most companies of its size, XMOS has been very focused in terms of the specific categories within audio that it looks to address.
“We have had some high-profile wins in the smart speaker’s market, but that is what we call a ‘red ocean’ market distorted by the likes of Amazon and Google,” says Lippett. “It’s a difficult one for a component vendor like XMOS to operate in.
“Our focus has been on smart TVs, set top boxes and sound bars and there’s been a real pull from the high-volume TV vendors for our technology – you can now find intelligence in a much broader range of products today.”
According to Lippett the next big wave, in terms of voice deployment, will be in the existing audio categories such as TVs, set top boxes and speakers as well as in the automotive space.
“Voice interaction on earbuds has been a fantastic use case,” he enthuses.
“Beyond that what we’re seeing, and which in many respects is being driven out of China, is growing demand for voice in things like domestic appliances and domestic well-being devices.
“We are also seeing the development of solutions that are significantly more competitive in terms of cost. That’s critical if we are to see voice deployed in low priced consumer products. You need a processing platform that is inexpensive but also low in power.”
With the advent of AI and voice Lippett expects to see different sensors being developed and far more integration going forward.
“We will see more classes of sensors being part of the story as well as the growing use of AI,” he suggests. “Greater consolidation of sensors will be driven by AI and computing at the edge.”
“There are a broad range of issues that need to be addressed when it comes to voice and that usually requires a combination of signal processing and AI, which means the use of neural networks and DSP techniques to clean up the quality of the signal.
“The quality of that signal will be determined by the environment, so we have to employ techniques such as beamforming, interference cancellation and noise suppression to achieve the best outcomes and these all require power.”
Another area of interest is where the processing will take place, according to Lippett.
Today most processing takes place in the cloud and that is likely to have to change with the growth in the use of voice across so many devices. Processing will have to be done at the edge, he argues.
“The issue with this, however, is how can we do that while balancing cost, performance and power?
“With the drive to make products smaller, our focus has to be on architectural innovations and delivering the additional processing load that’s going to be required. It might be ok to develop a co-processor to sit alongside existing silicon but that, in all honesty, will only be a temporary fix.
“AI capability is going to have to be absorbed into the system, otherwise the economics just won’t stack up.
“That’s why XMOS have been working on integrating the system into the same device and we have plans to unveil a series of new solutions over the coming few months.
“It’s all about technology and timing, in this industry,” suggests Lippett, “and I believe we’re in very strong position going forward. We have an established legacy to build on and there are some great opportunities in this space going forward.”