Sous Chefbot UX/VUI Case Study - ZhenZhen McMahon Portfolio

Overview

Role: UX/UI Designer, UX Researcher
Deliverables: Voice Scripts, Limited Voice Prototype (Adobe XD)
Timeline: 150 hours
Tools: Adobe XD, Google Actions Console, Google Suite, Miro, Whimsical

I used Google’s Conversation Design guidelines to create additional interactions that center organic social interaction (or as natural as you can get when conversing with a robot) in hopes of minimizing user frustration when it comes to using smart assistants.

End Goals

• Voice scripts detailing scenes, features, interactions, error states, and accepted intents for a voice forward VUI
• Supplemental visual interface for devices that accommodate visuals
• System persona for product design

Check out the Adobe XD prototype, or read on to learn about my process

View Prototype

MARKET RESEARCH

What does voice only and voice forward product design entail? Where is this sector headed and who is leading the way? (Google, Apple, and Amazon are the main contenders right now, each with a different strengths).

Smart assistant users report using them for a focused subset of tasks which can be considered straightforward in nature: obtaining a piece of information, performing a simple task, etc.

COMPETITIVE ANALYSIS

I focused less on technical efficacy, but rather user perception of system efficacy. For newer technologies, discoverability is a problem (especially non-visual technology) users are reluctant to retry features they previously had trouble with.

Preliminary Research

User Interviews

In this project I don't have performance data or a stakeholder telling me: 'x performed as such and we think it could do better. We want you to streamline y feature to promote usage.'

I went into testing in a unique position: I knew my project focused on voice interactions, but I didn’t have a clear picture of what precisely needed to be worked on. I went into research interviews with a very open mind and a vague set of questions.

I interviewed 4 smart assistant users — Google Assistant, Alexa, Siri, or some combination. Due to the pandemic, I conducted the interviews over Zoom.

Interview Goals

What smart assistants were people using? If a combination, what differences did they notice if any?

How did they use their smart assistants? E.g. how often, routine tasks, or do they explore new features? (Personality? Learning and growing?)

Observation of user querying/interacting with their assistant. I tasked users with a short and simple usability test and noted how they handled errors, and engaged the assistant.

Interview Trends & Outcomes

Needs

3 out of 4 participants used multiple smart assistants. For these people use varied depending on device type.
All participants cited efficiency as motive for using smart assistants over alternatives (searching for info, playing music etc).
2 out of 4 participants liked receiving feedback about the interaction while it was occurring. This includes visual feedback or “listening” beeps.

Behaviors

All participants explained questions need to be framed in simple, concise language. If their question was complex, users went directly to manual searching.
3 out of 4 participants used their smart assistant for trivia, telling them a joke, or other ‘fun’ uses.
One participant customized and bundled actions using Siri/Google (shortcuts/routines)

Frustrations

All participants recalled or demonstrated abandoning a task (or manually completing the task) if the assistant didn’t execute as expected on the first try.
No participants thought their smart assistant had learned from them throughout their usage.
All participants expressed support for assistant competency at tasks they considered innocuous, but were wary about potential future applications.

Research Synthesis

Personas were fun in this project — not only did I create personas to depict the main user groups that I identified in user interviews; I also had to create a system persona.

Information Architecture & Content Planning

I needed quantitative data to map out info architecture. I played around with the assistant's built in cooking features as well as some cooking oriented Google Actions. I noted features and what I felt was missing but I had to see if these sentiments were shared with the general public.

To map out the IA, I was initially stumped. With voice forward design, even the visual interface (Google Assistant, Siri, etc.) doesn't lay out the sitemap as traditional apps and sites do.

After a bit more research, I was able to visually sketch out my understanding of how interactions work as follows:

I want the "follow the recipe" intent to be a thru-line to which all extra queries return to once completed. Each of the satellite intents would go through the linear task flow architecture (shown above) then return to the main task (follow the recipe). Each step in the recipe would be completed like this as well.

Task & User Flows

I created two task flows covering the main product tasks: finding and following a recipe, getting help: making a substitution, conversion or cooking tip; and lastly for the peripheral features: sharing recipes or adding ingredients to the shopping list. These features enhance the user experience surrounding the main product features.

View Flows on whimsical

Putting it all together

I initially wanted to use Adobe XD for the prototype but the voice prototyping feature is too limited for my purpose. Voice UX aims to test conversational abilities, training phrases, and natural user utterances — Adobe XD only accepts a single specific utterance for each interaction, so it wouldn’t do for testing.
‍
–
‍
Visually I wanted the design to blend in with Google's own Assistant design: chips, cards, etc. I tried a few Google Actions using a unique aesthetic, but I found them jarring: I was interacting in the same panel as the assistant, but everything was structured differently.

Some examples of styling and design system in the Google Assistant for Android (Android 12)

Design Progression

I took a quite linear approach: I completed and fine-tuned the sample dialogs (best case scenario for each interaction), possible intents, and working prototype prior to getting user feedback. The reason for this: technical and time constraints.

Minimum Viable Prototype - Version 1

The MVP prototype consists of two main parts: the voice script documents and the Google Actions console voice prototype.

The Google Actions UI for user testing is bare bones, but it allows users to speak conversationally, and the system will understand prompts outside of the verbatim suggestion chip (unlike Adobe XD).

'Voice script training phrases page' - What users can do in each scene, what intents could be accepted, and potential ways the conversation may go or veer off

Voice prototype in the Actions console - I plugged my conversation components into the Google Actions console for testing with users.

Visual Prototype MVP

Most of the screens for this app are simple and text based. Again, this project was focused primarily on learning conversation design; not the visual design aspect of this product. Keep in mind, the capabilities of Google Actions are far more limited than those of a true chatbot or smart assistant.

Standard 'follow the recipe' view

The meat and potatoes: following the recipe

For Sous Chefbot, I'm keeping the visual aesthetic in line with Google's own Assistant styling. As this project was primarily focused on the conversation design, with an imagined context of "voice forward devices", the recipe steps are very pared back and simple.

I wanted to keep this content focused, because recipe sites vary in structure and bloat (ads, popups, auto-playing videos, etc.) If the user wants to view the full recipe on the site, a future feature of the action would allow Google assistant to load the recipe website on a visual compatible device (TV, phone, etc)

Simple cards and results

The action will perform basic queries for recipes. When it returns a suggested recipe, it will supply the Google search results card (on visual capable devices)

I formatted this similarly to how existing Google search results cards are. It's familiar to users and they can quickly scan it without needing to learn something new.

Query results

Transition scene

Checkpoints and transitions

Checkpoints are really important for voice design. This is because the information architecture isn't visually decipherable to the user.

Coincidentally, creating this shallower design infrastructure for the user is also good for the designer. This transition screen is utilized many times through Sous Chefbot. We don't want to transition users to new scenes without the user understanding what's going on (especially if they're being transitioned to scene they don't want to be in).

By using the transition, we can gently guide the user back to the main task (following the recipe) but also remind them that they can access other features if they still need help or if their request wasn't fulfilled.

Usability Test Planning

Working prototype done, time to test. For the usability test plan, my goal was to ensure people could successfully navigate through the action, get help when they needed it, and discover features.

Task #1 - Invoke the action and select a recipe (chicken soup)

"You’re going to call up the action, and choose a recipe to get started with. The recipe is going to be for “chicken soup”

Task #2 - Cooking tip / Substitution

"This time, you’re going to ask for a chocolate cake recipe. In step 1, there is going to be an ingredient you don’t have: muscovado sugar. Ask for an alternate ingredient that you can use instead."

Task #3 - Make a conversion

"Repeat the last flow: invoke the assistant, start the cake recipe, but this time you want to convert measurements to cups."

Usability Test Participants

I wanted to perform in person, moderated usability tests due to the tech constraints of using the Google Actions portal for testing. The alternative was to do analog WOZ (Wizard of Oz testing) which is not ideal for such an advanced stage of product design.

This drastically narrowed my potential participant pool to people I could test in person, but I was able to recruit 4 smart speaker users for the test.

I tried to anticipate how users may go about a task and design screens specifically for any possible intent. Here's a glimpse of the backend of the voice prototype showing possible user intents, scenes, and invocations.

Usability Test Findings - Key Patterns

All 4 participants used utterances that didn't match the chips. Users naturally want to be conversational.

2.5 out of 4 participants encountered an error state prompt from the system and successfully navigated back to the flow/their desired action without getting kicked out of the conversation.

All 4 participants had trouble with task #2, and largely the same issue: they told the bot they wanted something (substitution) and what they wanted done at the same time. V1 had the assistant flow filtering down to identify the issue over the course of 3 scenes. This architecture needs to be flattened so the system understands these granular assistance intents (cooking tip, substitution, conversion) at any time.

All participants found Sous Chefbot to be friendly, playful and eager.

Iterations, Creating V2

Based on the data from the usability tests I prioritized revisions for V2 of the prototype.

The one big experience change that was needed: flatten the hierarchy, allow users to access features more broadly through the flow and allow the system to pivot/accept more intents rather than request rephrasing when error handling.

Part of the prototype limitation was my own lack of technical knowledge. When working on V2, I converted "substitution"; "conversion"; and "cooking tips" to global intents. Meaning if the system identifies any of these intents in user utterances at any time, they'll bring up the appropriate feature flow.

Read the detailed changes notes in this document.

New Technologies: The Learning Curve

I’m excited to work on more VUI projects in the future. Particularly with a team, so I can see how different teams have chosen to convey this type of design, what platforms they use, how they usability test, and where this could go in the future with even more sophisticated platforms to accommodate it.

I would love to publish this as an actual Google Action at some point. Before that time, I would need to learn more about what a web hook actually is, how to set that up, and probably a heck of a lot more about different cooking techniques.

What I learned...

In the future, I'd like to do a similar project but implement more testing phases along the way. Even if just low-fi wizard of oz testing. I think retrospectively, I got really lucky: the issues with V1 of the prototype were structural, fixable, and consistent through usability testing.

So when prioritizing revisions, I had just a few sweeping changes to make: adding global intents which allow the most important things to be accessible anywhere. I had to remember that suggestion chips can be used to remind users that they can do something (even if they always could do it, sometimes they don’t remember).

It was helpful to witness people using the prototype, because they used some really simple phrases that I overlooked. As a result, the system didn’t pick up on them and I was able to add additional training phrases to the project. Frequent testing is key!