Otter: The future of voice AI
A talk with AISense CEO Sam Liang about a fascinating emerging landscape
I’ve been writing about new technologies since my days as a columnist at four magazines in the late 1990s. And lately I’ve been incorporating new tech — especially artificial intelligence — into my high-tech thrillers.
So when acclaimed tech author Shel Israel (co-author of “Naked Conversations,” “The Fourth Transformation,” et al.) invited me to the offices of AISense (AI as in artificial intelligence) in Los Altos this week, I jumped at the chance.
What drew us to their AISense HQ was their new product Otter.ai, which we saw in action at the VB Summit in Marin County last week. Both Shel and I knew we were seeing something remarkable: real-time translation of the conversation on stage that was so fast and accurate, it almost seemed like a teleprompter that the speakers were reading from.
The first thing we did was go home and download the app. The Otter Voice Notes consumer app — free in the Apple App Store and on Google Play — lets you record and transcribe up to 600 minutes of audio every month at no cost. A premium version provides more minutes and lets you share the audio along with the text. The live transcription service made a splash at TechCrunch Disrupt SF in September, and Otter has received glowing press coverage in TechCrunch, Fast Company and elsewhere after its launch in March.
Excerpts from our chat with Liang
I came into our meeting with CEO Sam Liang without having done any homework about the product, which had its downsides (we asked some rudimentary questions) but also upsides (we were brainstorming ideas about possible applications and use cases for an accurate AI transcription service, since Shel and I both write about what’s coming down the pike).
Liang, a transplant from China, was the Google engineer who put the blue dot on Google Maps and then sold his location-based startup Alohar Mobile to Alibaba. Liang was forthcoming about the company’s plans, though coy about the number of employees at AISense (“under 20”).
Here are some short excerpts from our hourlong conversation — and yes, I used Otter for all of these notes, though I cleaned up the syntax and corrected the occasional errant word. Let’s start with some of the fascinating long-term implications for voice recognition technology.
Liang on a futurescape with always-on recording: “One big angle we are thinking about — and this could be scary to a lot of people — is what happens when basically your entire life can be recorded with Otter. We’re actually getting close to that. Eventually the wearable devices have microphones built in. Of course, this means the information has to be controlled. Who can access it? But for yourself, it could record constantly. I truly believe it will happen. We actually did a calculation and found your entire life can be recorded on a USB drive. It’s just a few terabytes. All the sounds in your entire life.”
‘I truly believe it will happen. We actually did a calculation and found your entire life can be recorded on a USB drive. It’s just a few terabytes.’
“I actually wish I could listen to what my mother told me when I was in high school. I want to just understand how, through my college and early years when I came to America. I wish I could search on who did I meet, and when? What did that guy tell me?”
Liang on radical transparency and potential use of Otter in the classroom: He says he would like to see the day when he can hook up his two sons with Otter during the school year. “I want to understand how they actually learn, how they interact with their peers, how they interact with the teacher. Again, you know, we don’t really want to intrude on privacy or anything, but from the personal development point of view … it actually [offers] tremendous value there. For parents, what children are doing in school is a black box. Who’s a good or bad teacher? This is a controversial idea but I think all the classrooms should be recorded. I want to have the information locked and available privately just for you — not the government [or the school board]. And not every parent will want this.”
Liang on the company’s mission: “We sort of borrowed Google’s mission statement. We want to organize the world’s voice information and make it universally accessible and useful. Google hasn’t actually done much on that. It’s actually hard to search for voice information.”
Liang on what Otter brings to the marketplace: “We actually don’t see Otter as a transcription app. Transcription is just the first step to get text from voice, but we will do a lot more once the transcript is available, such as understanding the conversation, extract action items, summarize the conversations, recognize some key points, people’s names, understand people’s emotions, etc. It will understand the dynamics between multiple people, maybe even better than the humans themselves.”
Liang on how Otter can enhance business meetings: “Our intention is about better note taking in business meetings, about spending time focusing on the other people in the room instead of having to look down at your screen when you’re taking notes — Otter will do that for you. The goal is to be able to save time, to search things better, to get better insights.”
Simply put, it’s a productivity tool that doubles as a business intelligence archive and has the potential to do much more. The app (I didn’t immediately understand this when I first used this) has built-in audio that syncs with the transcription, so you just tap on a word and the audio starts playing.
The competitive landscape: The Big Five — Google, Facebook, Apple, Amazon and Microsoft — have snapped up many of the voice recognition AI experts in the tech field. But AISense is going deeper than any of them. A smaller player, Nuance, makers of Dragon Naturally Speaking software, gets most of its revenue from dictation in a medical setting. Liang doesn’t see them as a main competitor. “There are also a couple of startups focused on sales conversations. A company called Deep Gram is doing video search,” he said.
Liang on video translations and closed captioning: “We seeWe see it as much more powerful than the traditional closed caption. It’s actually an audio DVR in some sense. With DVR, you can always quickly rewind and watch the video again. With this, actually effectively it can achieve that [for audio].”
How Otter learns from its mistakes. When consumers correct a word through the UI, “we actually put that into the training system,” Liang said. Its system also crawls the Internet to learn new words, though that’s not fully integrated into the program yet, he added.
Keywords, search and saving time with Otter
A couple of last points:
Atop each transcript, Otter provides a list of keywords that were used frequently in the conversation. You can click on each word to jump to that part of the transcript. The search feature is particularly cool.
AISense has raised $13 million in funding with a $10 million Series A round. The company has already licensed its transcription technology to web conferencing platform Zoom, but the focus for now is optimizing the enterprise version of Otter. Consumers, meantime, will be able to take advantage of the free app for iOS and Android.
We touched on a lot of other topics, such as market opportunities for Otter for writers and journalists, for college students who want to record their professors’ lectures, for podcast transcriptions (to rank higher with SEO), uses by medical professionals.
As a writer, it’s incredibly freeing to be able to focus on your subject instead of worrying about whether your recorder is picking up your subject. Otter is also a big time-saver: You no longer have to go home and spend hours typing up your notes.
I’ll leave it there. You can read more about Otter at AISense.com or at the articles I cited above. Here are Shel’s impressions.
Otter is not yet perfect. Some words are left out, others are improperly transcribed. Transcripts can be choppy, with sentences cut off in the middle.
Those are quibbles, though. On the whole, this is the speech translation app I’ve been looking for — for years and years. I no longer record in QuickVoice Pro and transcribe by hand. Now I just use Otter.