Role: UX/UI Designer, UX Researcher
Deliverables: Voice Scripts, Limited Voice Prototype (Adobe XD)
Timeline: 150 hours
Tools: Adobe XD, Google Actions Console, Google Suite, Miro, Whimsical
I used Google’s Conversation Design guidelines to create additional interactions that center organic social interaction (or as natural as you can get when conversing with a robot) in hopes of minimizing user frustration when it comes to using smart assistants.
• Voice scripts detailing scenes, features, interactions, error states, and accepted intents for a voice forward VUI
• Supplemental visual interface for devices that accommodate visuals
• System persona for product design
Many current users reported ‘fun’ as a key reason to use smart assistants. Smart tech can be delightful, novel, and entertaining. From another angle, this tech unlocks a new realm of accessibility tech in spaces that were previously cumbersome to navigate.
The potential of voice interfaces/voice forward interfaces is immense. More devices are offering smart assistant compatibility (household appliances etc) and new integrations are being developed every day. Applying and standardizing good UX of this product type will help users become fluent and confident in voice interaction as a supplement to their visual devices; not as a last resort.
To better the VUI experience, increasing the user base is crucial. By widening the dataset we can better understand and adapt designs to react or respond appropriately even when users speak with implicit intention.
Training phrases, VUI IA (info architecture for voice design), and different implementations allow the machines to slowly become better at understanding user commands and utterances.
What does voice only and voice forward product design entail? Where is this sector headed and who is leading the way? (Google, Apple, and Amazon are the main contenders right now, each with a different strengths).
Smart assistant users report using them for a focused subset of tasks which can be considered straightforward in nature: obtaining a piece of information, performing a simple task, etc.
I focused less on technical efficacy, but rather user perception of system efficacy. For newer technologies, discoverability is a problem (especially non-visual technology) users are reluctant to retry features they previously had trouble with.
In this project I don't have performance data or a stakeholder telling me: 'x performed as such and we think it could do better. We want you to streamline y feature to promote usage.'
I went into testing in a unique position: I knew my project focused on voice interactions, but I didn’t have a clear picture of what precisely needed to be worked on. I went into research interviews with a very open mind and a vague set of questions.
I interviewed 4 smart assistant users — Google Assistant, Alexa, Siri, or some combination. Due to the pandemic, I conducted the interviews over Zoom.
What smart assistants were people using? If a combination, what differences did they notice if any?
How did they use their smart assistants? E.g. how often, routine tasks, or do they explore new features? (Personality? Learning and growing?)
Observation of user querying/interacting with their assistant. I tasked users with a short and simple usability test and noted how they handled errors, and engaged the assistant.
Personas were fun in this project — not only did I create personas to depict the main user groups that I identified in user interviews; I also had to create a system persona.
Ayaka and Desmond represent main user behaviors:
• People delegating simple tasks to the assistant
• People seeking information.
THINKS
FEELS
SAYS
DOES
I needed quantitative data to map out info architecture. I played around with the assistant's built in cooking features as well as some cooking oriented Google Actions. I noted features and what I felt was missing but I had to see if these sentiments were shared with the general public.
To map out the IA, I was initially stumped. With voice forward design, even the visual interface (Google Assistant, Siri, etc.) doesn't lay out the sitemap as traditional apps and sites do.
After a bit more research, I was able to visually sketch out my understanding of how interactions work as follows:
I want the "follow the recipe" intent to be a thru-line to which all extra queries return to once completed. Each of the satellite intents would go through the linear task flow architecture (shown above) then return to the main task (follow the recipe). Each step in the recipe would be completed like this as well.
I created two task flows covering the main product tasks: finding and following a recipe, getting help: making a substitution, conversion or cooking tip; and lastly for the peripheral features: sharing recipes or adding ingredients to the shopping list. These features enhance the user experience surrounding the main product features.
I initially wanted to use Adobe XD for the prototype but the voice prototyping feature is too limited for my purpose. Voice UX aims to test conversational abilities, training phrases, and natural user utterances — Adobe XD only accepts a single specific utterance for each interaction, so it wouldn’t do for testing.
–
Visually I wanted the design to blend in with Google's own Assistant design: chips, cards, etc. I tried a few Google Actions using a unique aesthetic, but I found them jarring: I was interacting in the same panel as the assistant, but everything was structured differently.
I took a quite linear approach: I completed and fine-tuned the sample dialogs (best case scenario for each interaction), possible intents, and working prototype prior to getting user feedback. The reason for this: technical and time constraints.
The MVP prototype consists of two main parts: the voice script documents and the Google Actions console voice prototype.
The Google Actions UI for user testing is bare bones, but it allows users to speak conversationally, and the system will understand prompts outside of the verbatim suggestion chip (unlike Adobe XD).
Most of the screens for this app are simple and text based. Again, this project was focused primarily on learning conversation design; not the visual design aspect of this product. Keep in mind, the capabilities of Google Actions are far more limited than those of a true chatbot or smart assistant.
For Sous Chefbot, I'm keeping the visual aesthetic in line with Google's own Assistant styling. As this project was primarily focused on the conversation design, with an imagined context of "voice forward devices", the recipe steps are very pared back and simple.
I wanted to keep this content focused, because recipe sites vary in structure and bloat (ads, popups, auto-playing videos, etc.) If the user wants to view the full recipe on the site, a future feature of the action would allow Google assistant to load the recipe website on a visual compatible device (TV, phone, etc)
The action will perform basic queries for recipes. When it returns a suggested recipe, it will supply the Google search results card (on visual capable devices)
I formatted this similarly to how existing Google search results cards are. It's familiar to users and they can quickly scan it without needing to learn something new.
Checkpoints are really important for voice design. This is because the information architecture isn't visually decipherable to the user.
Coincidentally, creating this shallower design infrastructure for the user is also good for the designer. This transition screen is utilized many times through Sous Chefbot. We don't want to transition users to new scenes without the user understanding what's going on (especially if they're being transitioned to scene they don't want to be in).
By using the transition, we can gently guide the user back to the main task (following the recipe) but also remind them that they can access other features if they still need help or if their request wasn't fulfilled.
Working prototype done, time to test. For the usability test plan, my goal was to ensure people could successfully navigate through the action, get help when they needed it, and discover features.
"You’re going to call up the action, and choose a recipe to get started with. The recipe is going to be for “chicken soup”
"This time, you’re going to ask for a chocolate cake recipe. In step 1, there is going to be an ingredient you don’t have: muscovado sugar. Ask for an alternate ingredient that you can use instead."
"Repeat the last flow: invoke the assistant, start the cake recipe, but this time you want to convert measurements to cups."
I wanted to perform in person, moderated usability tests due to the tech constraints of using the Google Actions portal for testing. The alternative was to do analog WOZ (Wizard of Oz testing) which is not ideal for such an advanced stage of product design.
This drastically narrowed my potential participant pool to people I could test in person, but I was able to recruit 4 smart speaker users for the test.
I tried to anticipate how users may go about a task and design screens specifically for any possible intent. Here's a glimpse of the backend of the voice prototype showing possible user intents, scenes, and invocations.
All 4 participants used utterances that didn't match the chips. Users naturally want to be conversational.
2.5 out of 4 participants encountered an error state prompt from the system and successfully navigated back to the flow/their desired action without getting kicked out of the conversation.
All 4 participants had trouble with task #2, and largely the same issue: they told the bot they wanted something (substitution) and what they wanted done at the same time. V1 had the assistant flow filtering down to identify the issue over the course of 3 scenes. This architecture needs to be flattened so the system understands these granular assistance intents (cooking tip, substitution, conversion) at any time.
All participants found Sous Chefbot to be friendly, playful and eager.
Based on the data from the usability tests I prioritized revisions for V2 of the prototype.
The one big experience change that was needed: flatten the hierarchy, allow users to access features more broadly through the flow and allow the system to pivot/accept more intents rather than request rephrasing when error handling.
Part of the prototype limitation was my own lack of technical knowledge. When working on V2, I converted "substitution"; "conversion"; and "cooking tips" to global intents. Meaning if the system identifies any of these intents in user utterances at any time, they'll bring up the appropriate feature flow.
Read the detailed changes notes in this document.
I’m excited to work on more VUI projects in the future. Particularly with a team, so I can see how different teams have chosen to convey this type of design, what platforms they use, how they usability test, and where this could go in the future with even more sophisticated platforms to accommodate it.
I would love to publish this as an actual Google Action at some point. Before that time, I would need to learn more about what a web hook actually is, how to set that up, and probably a heck of a lot more about different cooking techniques.
In the future, I'd like to do a similar project but implement more testing phases along the way. Even if just low-fi wizard of oz testing. I think retrospectively, I got really lucky: the issues with V1 of the prototype were structural, fixable, and consistent through usability testing.
So when prioritizing revisions, I had just a few sweeping changes to make: adding global intents which allow the most important things to be accessible anywhere. I had to remember that suggestion chips can be used to remind users that they can do something (even if they always could do it, sometimes they don’t remember).
It was helpful to witness people using the prototype, because they used some really simple phrases that I overlooked. As a result, the system didn’t pick up on them and I was able to add additional training phrases to the project. Frequent testing is key!