How does one navigate and organize information?
This is the problem I’ve been tackling within my current role as a VUI designer in the Alexa Info UX org. From a product design perspective, this is something that cuts across a lot of different products and domains, but I do want to take a step back and talk about how this is unique for voice and navigating general knowledge information for Alexa.
When we think about how we navigate information, I think of this as a basic mental model.
Now this makes local sense and we can see it represented visually in the way we organize our own information in lists, in tabs, and folders within our file storage. However, how do you navigate this information through voice when there is no visual to tell you what’s behind, in front of, and to the side of you?
Navigation and deep dives are particular design topics that we work on within our information domain for that reason. But the other day one of my PMs said something kind of revolutionary: “what if there was no back button?”
Our concept of “back” has been created because of how we work with software (phones, computers), day in and day out. What if that mental model was broken?
But another data point to think about here: we don’t have back buttons in conversation. You don’t rewind conversations while you’re having them to be like “let me go back to this topic” (you do in some sense but you say “hey what was that thing you were going to say before I cut you off?”). This is precisely the goal of Alexa, to be able to shift the visual paradigm to be one of voice and conversation. And while a lot of users are fixated on what Alexa can and “can’t do” because of the uncanny valley (Voice Tech Podcast), we’ve also trained people to use machines like machines, and not humans.
For example, the entire concept of skeumorphism is how we’ve taken our real world and created mental models of software from that. So why are we training ourselves to say things like “Alexa, what’s the stock price as of four oh three PM?” when we would just ask our friend “hey what’s Amazon trading at?”. The entire concept of golden utterances in voice design (IMO) is kind of broken in that sense. We keep trying to get customers to say the “golden utterances” instead of what they would say organically (obviously golden utterances are based off research and are the most common phrases, but still). Golden utterances are also a necessary evil from a development perspective to be able to create technical architecture (otherwise we would constantly be boiling the ocean to push out one feature).
All this goes to say, that while I believe voice is not THE modality of the future, it does help us think about our interaction models with machines very differently. Language constantly adapts, changes, across time, cultures, and contexts. To get machines to understand is one thing but changing the way humans think about that is definitely another.