Not only in the consumer space, however, but increasingly in the B2B space too.
Voice recognition software and interactive devices are starting to have a massive impact on how we consume and interact with technology but, for voice recognition technology to achieve a real breakthrough, it will need to become more aware of the entire sound space, not just voice and musical sounds.
Voice recognition AI currently doesn’t include data on human sounds or animal sounds, for example, but once sound recognition and contextual awareness have been added to the mix the potential for this technology is unlimited.
Speaking at AudioCollaborative, an event organised by market analysts Futuresource in London, last month, Theunis Scheepers, UK Country Manager for Alexa Voice Services, Amazon, said that Alexa had originally been imagined as, "A computer in the Cloud, controlled by an individual’s voice. We wanted anyone to be able to use it, to ask it anything and for it to be able to respond to requests made at home, in work or on the move.
"The rise of computers has actually seen voice marginalised, what with the rise of keyboards, mice and then touchscreens – we wanted to create a more natural, and efficient, way to interact with technology."
The adoption of voice has been a lot slower than expected but since 2017, according to Scheepers, "there has been a step up in engagement. We’re seeing rapid growth and by the end of this year 87million smart devices will have been deployed, and we expect that to rise to 388m in five years. The growth curve in this space is incredible."
Alexa and other voice recognition devices continue to evolve, and Amazon is working to create a personality for the device.
"We see that as crucial for creating a sense of engagement with the technology," he explained. "Alexa told something like 100million jokes to users this year and as its personality evolves we see more use cases developing. We’ll be able to make it more aware of specific cultural references, which will be increasingly important as we roll it out into different territories."
User requirements
Research conducted in the UK, US, France and Germany has found that listening to music tends to be the key activity for smart speaker owners, followed by checking the news, checking weather, transport or traffic and managing calendars and alarms.
"People are engaging with Alexa though music and asking it to provide a play list; music wasn’t seen originally as a first use case and that has changed. It’s helping consumers to use voice recognition to discover new music and encourages greater engagement."
According to Scheepers, speaker identification is becoming increasingly important, especially as Alexa is being deployed in communal areas.
"We need to ensure that voice technology is able to identify the correct speaker and calls the right number, for example. The smart home is a fast expanding use case for voice recognition and our understanding of the smart home is changing, as a result."
That raises the question as to how Amazon engages with developers.
"Amazon has developed a voice service platform to help developers," said Scheepers. "We’ve invested in skills so that we can scale Alexa and help brands looking to voice-enable their products."
There are currently 50,000 skills, essentially apps, on the Alexa platform and Scheepers said that the company was keen to engage with industry.
"There are 20,000 Alexa compatible devices deployed globally and we have a certification programme for platform designers and companies. Our aim is to make it easier to integrate Alexa into products – we can do the heavy lifting in the Cloud and tools are available to develop voice recognition enabled products. Some 5,000 brands have already engaged with us in the smart home space."
According to Scheppers Amazon’s vision is for consumers to have the audio experience follow them from car to home and work.
"That will require many complex interactions for that to happen," he conceded. "We see a future in which multiple voice assistants will be deployed with Alexa talking to Cortana, and so on."
The impact on content is expected to be profound too but, as voice becomes the primary way of accessing content, questions about who will control and own that experience will need to be addressed.
According to Joel Sietsema, SVP of Brand Management at Sound United, "Music will remain hugely important to this platform. We tend to see other applications falling away, in terms of their use, especially as the market moves from early adopters to majority users."
One of the biggest challenges for the industry, according to Sietsema, is, "How we educate consumers in using voice and help them to get voice up and running in their homes; how do we troubleshoot problems – a voice agent would be very helpful going forward."
"We see voice as complementary to standard interfaces," suggested Brian Moller, VP Engineering, Roku. "It’s all about the user getting to the content as quickly as possible."
According to Gerry Holman, European Sales Director, Linkplay, "All manner of devices and use cases deploying voice are under development. We’ve been working with the audio entertainment world but now our clients come from the telecoms, home appliance and automotive sectors."
In terms of future development Amazon thinks developers will want to be more experimental, "They find the personality aspect more interesting and that’s likely to provide them with the opportunity to develop unique services," said Scheepers. "The smart home is becoming more important and while users tend to start with music, they become more engaged and active within the smart home environment."
B2B and voice
The adoption of voice assistants in the B2B space has tended to lag up to 2-3 years behind the consumer space.
Traditionally, new technology and platforms have tended to be developed in the B2B space before moving to the consumer market, with voice that has been reversed.
There is now, however, a huge buzz around voice recognition technology in the B2B space. How can voice can be used and what services and experiences could be developed?
Research from Futuresource found that the B2B segment was being held back by a lack of clarity and understanding about the application of voice and what the commercial use cases were.
Bryan Sutton, Director, Technical Sales. Microsoft said, "B2B adoption of voice is being held back because there really isn’t a use case that will drive speech. Alexa, like the iPad, delivered to consumers something that they didn’t really know they needed or wanted. We don’t have that killer app yet in B2B, whether that’s translation, transcription, conferencing and the like.
"Voice is a bit like ‘touch’ ten years ago. Why do I need it? Now I use it all the time – perhaps speech is at the same inflexion point?"
"When it comes to B2B the technology has got to be working from the start," said Dr Paul Neil, Vice President Product and Marketing, XMOS. "It will require an employee journey, and you’ll need to plan that very carefully.
"In the deployment of smart speakers form factor will exercise a lot of control. If you’re looking to add it to a white board, for example, you’ll have less control over the placement of microphones, where they sit for example, and that will require more flexible processing capabilities. Speakers will need to be able to adapt to different sound and physical landscapes."
IBM’s Andy Barnes, Executive IT Architect, Watson IOT, warned that delivering privacy and security would be crucial to the successful deployment of voice in the B2B space.
"However, if the benefits outweigh the security concerns then I believe it will be adopted. Businesses have to work with GDPR, so they have to be responsible for how data is collected and shared.
"Questions that need to be addressed include: is my data being monetised and do I want people learning about my business if data is vulnerable?
"There are two aspects to using voice – the employer and the employee perspective. We need to be able to provide assurances as to how data is used, where it’s going. Trust will be the number one barrier to adoption.
"Until you have that trust, data doesn’t go to the Cloud until you say it can. In the enterprise world, that’s critical. Edge processing could provide a solution."
Barnes also made the point that using voice may require some kind of secure biometric registration to ensure individual privacy and business security.
"Encryption will need to be built in," he suggested.
The challenge for deploying voice in the work environment is significant.
According to Dr Neil, "the difference between the home and work space is profound. You need to be able to examine the whole soundscape of a space and everything in it. You need to be able to identify individual speakers within that soundscape, and you’ll need an element of control when it comes to determining who is speaking, removing noise in that environment.
"That will take a lot of processing and for silicon providers that’s a challenge, we’re having to pack more capabilities into smaller devices."
Hospitality, in this case a hotel, could provide an interesting use case for voice in the B2B space
Many businesses are looking to use voice but are looking to customise a pre-packaged product – they want voice to mirror their organisation’s culture, whether that’s through an accent or tone of voice, but that needs an extensive dialogue ‘tree’ and much better contextual awareness on the part of the technology.
Will one device win out? Unlikely, the focus, for now, will be on the functionality of devices and how they are dispersed in an environment.
Longer term, the aim of those working in this space is to not only create assistants that are more ‘human’ but to differentiate products in what is becoming a highly competitive marketplace.
In the business market return on investment will be crucial; what am I paying for and what will I be getting for that investment, are front and centre when it comes to discussions regarding the deployment of voice.
But as Barnes suggested, "The benefits of voice can also be measured indirectly. Does it make employees happier and more productive?"
Over the next five years B2B is expected to have caught up with the consumer space and voice is expected to have gone mainstream; not only will it be a more natural interface but through AI it will learn about its users and become more personalised. On top of that companies will be able to develop further, differentiated applications.
Sutton suggested that could lead to voice recognition devices, "Listening to emotions and understanding how you feel at a specific moment in time."
Whatever the future holds the technology is in place or coming – "The basic building blocks, next generation processors, Cloud based services, the ability to support end-to-end encryption, all the basic requirements are in place," concluded Prof Neil.