Ranjan here. This week I'll be writing about Amazon's Alexa, voice as a platform, and what it takes to realize the potential of an innovation.
I have four Echo devices in my household. For years, we used them to control the lights in our apartment, created endless timers, asked for the weather thousands of times, and destroyed my Spotify algorithm as my kids learned to say “Alexa, play Baby Shark.” It’s gotten a bit ridiculous as we don't have any physical clocks in our house and regularly ask Alexa the time. Is this not quite on brand for someone who rants about topics like data privacy and Amazon's competitive stranglehold? Yeah, I know, but we're complicated here at Margins.
Over the past year there is one thing that has gotten progressively worse: annoying follow-up questions. An increasingly typical interaction:
"Alexa, what's the weather?"
"It's 41 degrees and cloudy. Did you know I can also create a shopping list for you?"
When looking around on how to stop it, there were hacks or temporary fixes from Amazon staff, but it seems Amazon made the strategic choice to make this part of the Alexa experience.
The other day was the final straw. Someone in my household made a request, and during the follow-on prompt, in a terrifying concert, my wife, my 3 and 6 year-olds, and me, all, in a tone that no parent should ever encourage for their kids, yelled, "Alexa, NO!!!!" This really happened and was probably the impetus I needed to write again 😀.
The Promise
Amazon debuted the Echo device on November 6th, 2014. I just went back through my Amazon history and saw I purchased the speaker in March 2016.
For those who feel we're sometimes overly critical about tech at Margins, this was probably my peak "tech-can-do-no-wrong" phase. As you can see from these early tweets, it was a time of pure fascination and excitement.
I became a full convert for the magic of Alexa after having my first kid in December 2016. Cradling a newborn and then being able to turn off the lights (I had a Philips Hue setup) while starting some white noise, all without putting down your kid was a genuinely emotional thing.
What made Alexa so special relative to its cousin Siri? A big part of the magic was just how fast it responded to your query. There was nothing more annoying than asking your iPhone something and just watching a squiggly line think and process. And think and process. And think and process.
It was cool to learn that was part of Bezos’s master plan. This April 2016 Business Insider piece talked about the invention of the Echo:
One major concern for Echo was latency, or the time it took for Alexa — the name of the talking virtual assistant that powers Echo — to respond to any query.
The average latency of existing voice-recognition technology at the time was around 2.5 to three seconds, so the Echo team initially set the goal at two seconds, according to an early team member.
But when the team presented its plan to Bezos, Amazon’s CEO countered with a much more ambitious target.
“I appreciate the work, but you don’t get to where it needs to be without a lot of pain,” Bezos told the team in a meeting, according to one early team member. “Let me give you the pain upfront: Your target for latency is one second.”
The team was “shell shocked” because even companies that worked on voice recognition for decades were only able to bring latency down to three seconds at the time. But at the same time, Bezos' directive also motivated the team to go for what seemed like an impossible goal.
I love this story. It's so perfectly tech in early 2016. Bezos getting his teams to do the impossible. Going to war on those seconds of latency would launch the next revolutionary new computing platform. As a content guy, I really believed it would open up a whole new world of interactive storytelling. We'd be querying data and getting lost in choose-your-own-adventure audiobooks and sending emails, all with our voice.
It's seven years later, and in many ways the Echo's ended up the "glorified clock radio" Benedict Evans called it in 2019. The difficult thing to reconcile is my family still uses it tens, if not hundreds, of times a day. It should not be a failure. But as Amazon just laid off thousands in the division, former employees call it a "a colossal failure of imagination," and "a wasted opportunity," and as no one talks about voice-as-a-platform anymore, it certainly feels like it is.
Et tu, Voice?
The Gartnerian hype cycle around the Echo and voice perfectly captures so much of tech in the 2010s. Big promises cut down simply because of how things worked at the time. Maybe our primary means of interacting with computers shouldn't be speaking to a computer, but I really would've thought that 12 years after the launch of Siri, the potential of voice would’ve been realized. Maybe my parents would be talking to computers as much as they typed, and at the least, I’d be talking to my Airpods more than I take out my phone while I walk around NYC.
Closed Off Ecosystems
I think there were four reasons that stunted the promise of voice: closed off ecosystems, overly ambitious proclamations, distorted monopolistic incentives, and of course, too much capital. The first culprit is best represented by Apple and the closed-off ecosystem it built around Siri. Thanks to its device stranglehold (which certainly has some benefits) it was allowed to let Siri continue to be the hot pile of garbage it continues to be.
With Siri, there are the little things, like preventing Spotify from being your default music player in order to keep pushing Apple Music. Calling up a song while walking or driving should be the most logical and simple use case for voice. Living partially in the Apple ecosystem, I still try things like quickly dictating a task to Asana and it somehow ends up trying to create a Reminder (and that's even having spent countless minutes trying to figure out Shortcuts).
For all the hype around AI, Siri still regularly confuses me asking it to "call Janie", my wife who I've called maybe 10,000 times in the past few years, with "Jennie Grouper", a girl I met once on a Grouper (anyone remember that?) date over a decade ago. Sure, I should clean that contact (I’m going to get in trouble for writing this), but after listening Eddy Cue spouting off the promise of Apple's AI while wearing his classic untucked shirt, I'd hope simple algorithmic logic would be able to solve this.
That’s my mini-Siri-rant, and I acknowledge Google and Amazon were better about building more open platforms. But each still built its voice platform to push its own peripheral services, often to the detriment of the user. Apple is the worst here, but the relatively closed nature of the overall voice ecosystem played a big role in kneecapping any transformative potential. Maybe, it's being overly optimistic, but imagine if some new company was solely focused on the voice platform, was able to integrate into androids and iPhones, and had its entire team focused on building seamless connections to other apps, services, and devices (Note 1). Imagine a world where you paid the company that delivered your voice platform and had customer service. I know, it’s crazy.
Let’s Change the World
Then there's the bigger issue - how the tech zeitgeist of the mid-2010s meant every innovation had to be an earth-shattering, wholly transformative thing that would change every facet of our lives. It wasn't enough to simply build the best 'glorified alarm clock' and then leverage that network into a larger computing promise. From day one, voice had to be the next platform. Instead of focusing on making the core use cases incredibly simple (smart home aficionados will know well the pain of trying to make basic things just work), Amazon had to sell us all on a new world where all commerce and computer interaction was via spoken word.
Adam Neumann is probably the most famous version of this - WeWork couldn't just be a real estate company. It had to be some utopian future of work. Uber is another good one - a few years ago Can, who worked at Uber a while ago, introduced me to a video they had released called Bits and Atoms:
Uber, which was at its core a better taxi company, had to connect the physical and digital world and completely transform the way people and things moved around. Self-driving cars would traverse the land, forever transforming logistics and travel. Even the nature of work could evolve as we rented out our cars to send packages and move other people. Or something. I remember when Can first showed me that video in 2019, I still thought it was kind of profound. When I go back and watch it, it feels just so perfectly 2016. And the sad part is Travis played it perfectly. His net worth of billions never would’ve come to be if he just told the world he was making better taxis.
ZIRP
Of course I'm going to go there: ZIRP played a huge role. Just think about every cycle we saw over the past decade. There would be some new technology like Blockchain, IoT, AI, VR, (and I'm sure I'm missing a few), and instantly everyone had to pretend it would change everything. All of these technological advancements could’ve been implemented gradually, finding product-market fit and building solid businesses from there. But instead, every startup in the space had to spout off big ideas to then be force-fed capital like a goose bred for foie gras. Then they’d never live up to their potential and be pushed into the trough of disillusionment. Startups that tried to grow responsibly would be blitzscaled into oblivion.
10x not 10%
Then you had the big tech companies. As each one sat comfortably on its own monopolized territory, gradual innovation simply didn't make any economic sense. If you're churning out profits, the incremental benefit from steady growth built on a new innovation would be a boring distraction. I still remember (and at the time buying into it) reading about the head of Google X saying that it was easier to create a 10x innovation than build a 10% improvement. People had to make statements like that because the 10% improvement would never get you the resources or promotions.
Voice couldn't simply be a cool feature that just gave you sports scores and told you the weather, and then evolve into something grander. Amazon is the flywheel king of losing money on certain things in order to build larger network effects, but it was mid-2010s blasphemy to simply have made money by…selling their speakers.
Alexa, what happened?
Now we enter a period where Amazon is laying off thousands and reportedly lost $3 billion in just the last quarter in its devices division. And those unprompted Alexa questions—I can’t describe just how dystopian they feel. As we try to normalize artificial interactions with human-like computers, having zero control over a device you’ve let into your home just so some product manager can juice their engagement KPIs is annoying and also kind of terrifying. Tech companies have to recognize just how powerless these interactions make the consumer feel.
Mobile and cloud, the two last great platform transformations, felt like they had organic evolutions. I waited 3 hours in line for the first iPhone and remember being excited to just pinch zoom and drink a fake beer. Cloud was the most boring thing that lived in the realm of HBR digital transformation articles and engineering forums. But these both realized the platform promise and changed the technological landscape in the ways every IoT or AR/VR is supposed to. Was it the constant 10% incremental improvement rather than the 10x rapid shift that allowed for this?
When I think about the degradation of voice computing, I really wonder how many other transformative technologies may have been beaten out of existence. It’s a powerful reminder that technological innovations, on their own, are not enough. The economic incentives underlying them, along with the user behaviors and prevailing attitudes projected on these technologies, all are vital to making a cool tech advancement become something much greater.
But I’ll end with a bit of optimism. As big tech companies pivot to efficiency, interest rates exist, and consumers demand better experiences, maybe, just maybe, things will make sense again and we can all enjoy the fruits of genuine technological innovation.
Until then, please, oh please, don’t ruin generative AI for me—that’ll be my next post, because I really, truly am bullish on the technology (Note 2).
Note 1: My co-host Can and I were talking and complaining about our respective smart-home setups. I’m trying to become a full HomeKit convert, and it’s astounding that little things like simply inviting your partner are broken (c’mon Apple!). But the Matter standard adds to my innovation optimism for the coming years. If anyone has any good pieces on how all these cutthroat companies finally got together to create a universal standard I’d love to read them. Maybe, just maybe, we’re at a point where the industry focus will move towards solving consumer pain points and creating widespread adoption.
Note 2: I am very excited about how generative AI could help advance voice computing. Processing a ton of information and generating one, kind of authoritative, conversational output is what this stuff is good at and should translate well for voice. I’ve been doing a lot of work in the space over the past 1.5 years and think there is tremendous potential—however, the other day I was visiting my parents and they were watching Kevin Roose on CNN and turned to me and asked “can you explain AI to me?” Please, oh please, don’t ruin generative AI people!
Cortana and the Windows Phone. Cortana was so much more useful and responsive than Siri, and did so many more intelligent things for me (like "remind me to buy a cat toy next time I'm at petsmart", or "schedule a meeting with Greg at the earliest opportunity"). Cortana lacked the ecosystem of Siri and Alexa and Google, relying on an open market for devices that never appeared. Windows Phone was trashed left and right by the sheeple, fanboys, and pundits, but I still miss those live tiles. One glance and I got weather, messages, headlines, inspiration. Didn't need the notification screen, didn't need to tap into apps to figure out the red dot. And gestures were welcome -- putting your phone face-down on the table to DND, face up when you don't mind being interreupted. I occasionally read an article now and then about how much the market needs a viable third phone OS. But alas.
Alexa annoys me to death. I used to have a half dozen around my house. Now I have PTSD and my trigger words are "by the way....". Amazon Prime includes a bunch of free music, and that used to be cool. But the incessant ads are killing it for me. Apple music isn't much better. I listen to music with explicit lyrics, but three songs in Apple Music has somehow decided I want to hear radio edits only and three songs after that it's playing Baby Shark.
But it's not just these platforms. Advertising is popping up everywhere I don't want it. And it's pervasive and overwhelming. I don't know if persistent robocalling for extended auto warranties eventually got people to buy them, but I know being overwhelmed with upsells and subscription offers is pushing me the other direction. I don't need that. I don't want that. I just want the lights to turn on when I say, "Alexa, turn on the bedroom lights" and not hear the dreaded, "By the way, customers like you also purchased 50 gallon vats of industrial lubricant. Would you like me to add it to your shopping cart?"
AI is this decade's voice computing. As usual they are knocking off the low hanging fruit creating the illusion of exponential progress but that final 10% will prove economically not worth it. Then it will decay. I suspect it will be even worse than voice & AR/VR given that like self-driving cars they have done zero significant work on the regulatory & legal side and people are getting wise to that more quickly today than they did in the past decade. I expect an avalanche of IP lawsuits and large amounts of data/content being closed off to AI models.