Internet Explorer is not supported by our website. For a more secure experience, please use Chrome, Safari, Firefox, or Edge.
Application Software
Dharmesh Thakker  |  March 20, 2019
Can You Hear Me Now? AI Tackles Voice and Video

Unless you’ve been living under a rock, you know that artificial intelligence is a big business trend right now. Corporate America is agog at the possibility of using AI to better extract and analyze data on everything from insurance claims to X-rays to the contents of your smart refrigerator (so you can buy more milk before you run out).

But lost in the headlines is the fact that AI, in some form, has actually been around for decades. And many of the hot AI applications being trumpeted in the press today aren’t really that advanced. A lot of startup companies I see now are pitching technologies that parse data from things like emails and text messages—usually to sell you something. In some ways, it’s not that different than the “text mining” of materials like emails and other documents that we’ve seen over the last ten to 20 years.

Now, though AI may finally be realizing its true potential. This is because AI is now being applied to richer data sets—things like voice conversations and video—with more-sophisticated algorithms to analyze the data. What’s more, much of the resulting intelligence is being delivered in real time. Think how AI is fueling the autonomous-driving trend, for example: Tens of cameras and multiple sensors inside today’s Teslas gather huge amounts of data that help the electric cars appropriately maneuver and avoid accidents in real time.

Alexa for the office

I think one little-noticed, but hugely important, application of this type of AI is analyzing voice conversations. Voice-driven consumer technologies like Amazon’s Alexa have dramatically changed how you do things in your home (“Alexa, turn on the porch light! Play me some Ed Sheeran!”). But voice in the workplace—where, arguably, it is more important and drives higher-stakes decisions–is only now getting that type of AI makeover.

This is an area ripe for innovation and investment, I believe, because there is so much actionable intelligence just waiting to be gleaned from all sorts of business transactions that are still conducted through regular voice conversations—not email, text or Slack. Until now, companies haven’t been able to pursue or capture those insights, as it’s extremely complex to capture and parse voice: There are so many elements of a spoken conversation that can be meaningful, from pauses to tone of voice to inflection. Using technology to make sense of it all is an important greenfield opportunity for new AI companies.

Think of the importance of an in-person sales pitch, for example, or a recruiter doing a phone screen on a job candidate. Then think about the productivity enhancements, and/or new revenue, that could be generated by using AI to analyze those conversations.

In the sales scenario, AI could capture and then parse thousands or even tens of thousands of sales calls to figure out what type of language, or specific pitch, worked the best in convincing a customer to buy. Similarly, with recruiting, a company could use AI to analyze patterns in how stellar recruiters are able to close candidates—what specific tactics or even words work best. Other applications could be stockbrokers speaking to clients, or even teaching.

Several startups are already working on this problem for sales—with technology some call “conversation intelligence” to analyze salespeople’s calls and meetings. These companies include*,, and ExecVision, among others.

Banks and other financial institutions, meanwhile, are leveraging different types of voice-based technologies to allow consumers to communicate directly with their banks through an iPhone, iPad, or Google Home device—without ever typing anything. The consumer can speak directly into the device or app to request services, like moving money from checking to savings. No human has to be involved. Startups working in this area include Kasisto, which makes a “conversational AI platform”, and Clinc, which sells a similar service to banks.

Where’s the hairdryer?

Similarly, hotels are using AI technology from a company called Volara that lets guests speak to a device to get answers to frequently asked questions (where’s the hairdryer, when does the gym open) and even recommendations for local attractions. Meanwhile, some car companies are using AI to transform in-auto voice assistants to make them smarter and more responsive to driver commands and questions.

There is a lot of talk these days about the “future of work”, and how technology is changing everything about how we do our jobs. But in many ways, I believe the future of work will be built on technologies that can analyze the spoken word. In some sense, voice—which seems almost analog in our fast-paced, digital world–is the next human computing interface. Hopefully you can hear me now!


This post originally appeared on Forbes.

* Battery Ventures provides investment advisory services solely to privately offered funds. Battery Ventures neither solicits nor makes its services available to the public or other advisory clients.  For more information about Battery Ventures’ potential financing capabilities for prospective portfolio companies, please refer to our website. For a complete of portfolio companies, please click here. is a Battery portfolio company. No assumptions should be made that any investments identified above were or will be profitable.  It should not be assumed that recommendations in the future will be profitable or equal the performance of the companies identified above Please refer to Section 1 of our Terms of Use for further information.

Back To Blog