Skip to content

Voice Assistants in the Age of Generative AI

    When Apple introduced Siri to the world in 2011, it was nothing short of a revolution. A personal assistant that lived in your pocket, Siri could answer your queries, send messages, place calls, and perform other tasks, all through voice commands. But today, in the world of generative Artificial Intelligence (AI), Siri and similar voice assistants feel somewhat antiquated.

    How Siri Works

    To understand why Siri now seems outdated, we first need to grasp how it works. At its core, Siri operates like a sophisticated database search engine. Your voice commands are converted into text, and then the system looks for specific triggers or keywords within that text. The keywords are matched against a pre-defined database, and Siri responds with the corresponding pre-set response or action.

    For instance, if you say “Siri, when is Dwayne Johnson’s birthday?”, the system identifies the keywords “Dwayne Johnson” and “birthday” and fetches the corresponding information from its database. Specifically, it would look up Dwayne Johnson’s record, find the field labeled “birthday,” and return the date. This structured, rule-based approach is efficient but lacks the flexibility to handle complex or ambiguous requests. Siri is designed to respond to the specific question with a specific answer, offering a black-and-white interaction that leaves little room for the nuanced shades of human conversation.

    Enter Generative AI

    Contrast this with Generative AI, which utilizes models like OpenAI’s GPT-4, known as Large Language Models (LLMs). These models operate in a fundamentally different way. Instead of searching a database for responses, they generate responses based on patterns learned from vast amounts of data.

    LLMs are trained on a diverse range of internet text. But rather than explicitly programming it to understand the meaning of every word or sentence, the model learns to predict the next word in a sentence. Over time and with enough data, the model can generate coherent and contextually relevant responses.

    For example, if you ask an LLM a question, it doesn’t search a database for the answer. Instead, it generates the answer based on its understanding of the words and the context, much like how a human would.

    The Promise and Shortcomings of Generative AI

    The promise of generative AI in voice assistants is immense. It could lead to more natural conversations and improved understanding of complex commands or queries. But, as of now, generative AI has a critical shortcoming – it’s relatively slow.

    While Siri can quickly look up a database and provide an immediate response, a generative AI model needs to generate the response, which can take some time. This latency may not matter in a text-based chatbot, but in a voice assistant where real-time responses are expected, it’s a significant hurdle.

    The Future of Voice Assistants: A Glimpse

    Despite this, some companies are already incorporating LLMs into their voice assistants. Amazon’s Alexa, for instance, uses a variant of generative AI to handle more complex queries. While it still relies on the database lookup for many tasks, for complex interactions or those requiring contextual understanding, it leverages generative AI.

    Amazon is further pushing the boundaries by continually improving Alexa’s LLM capabilities. They aim to create a more natural, conversational experience for users, moving beyond the somewhat mechanical interactions we currently have with voice assistants.

    The Future is Conversational

    While Siri and other similar voice assistants were revolutionary in their time, the advent of generative AI has exposed their limitations. The future of voice assistants seems to be moving towards a more conversational and contextually-aware model. Although this technology is still evolving and has its own set of challenges, progress being made by platforms like Alexa offers a glimpse into the future. With continuous advancements in AI, we can expect to see voice assistants becoming more sophisticated, intelligent, and capable of understanding and responding to us in ways that feel more human.

    nv-author-image

    Erik McNair

    Erik McNair is a digital marketing professional living in Arlington, OH. As co-owner of McNair Media, he has focused on developing and executing SEO and marketing strategies in a manner that supports the client’s consistent business growth and enhances brand equity and awareness. He attended and graduated from Georgia College & State University in Milledgeville GA with a degree in Mass Communications with a concentration in Telecommunications. He’s a certified Google Adwords, Google Analytics, and Bing Ads marketing professional. Outside of marketing, Mr. McNair is an avid technologist. He’s always running the latest software betas and testing out new and exciting products. He occasionally writes about thoughts on technology, but his main focus has been on growing and establishing McNair Media.