Designing interactive interfaces for speech recognition users

Lindsay Silver
UX Collective
Published in
6 min readApr 4, 2022

--

Summary: Follow these three principles to make interactive digital interfaces more usable for people who use speech recognition technology:

  • Have clear and consistent styles on interactive elements
  • Use clear text labels on interactive graphics and ambiguous icons when possible
  • Allow for users to guess by applying better alternative text to interactive graphics

Voice and speech technology has become a significant part of our daily lives and culture. Alexa and Siri are not only household names, but also exist in many of our actual household electronics.

Alexa and Siri are intelligent assistants that “use machine learning to determine suggestions for users, answer queries, or control devices” (Source: Apple Insider). However helpful, intelligent assistants cannot do every task we require of our devices. These assistants assume that a user is still able to control their phone through another method, such as direct touch, screen reader, keyboard, etc. As an example, I can ask Siri to make a Facetime call to one of our contacts, but I cannot ask Siri to activate the on-screen buttons to change the camera view to forward-facing or to end the call. For this kind of task, speech recognition technology is required, such as Dragon Speech Recognition or iOS’ Voice Control.

When discussing speech recognition in this article, I will be focusing specifically on users who control their devices with Dragon Speech Recognition technology, or simply, Dragon. Dragon is a very popular software that is used by legal and medical professionals to dictate notes. It is also widely used by people with limited mobility, such as people with quadriplegia, or people with very limited hand, arm, or finger use who prefer to use their voice to interact with their device.

I will present three ways to make your interactive digital interfaces more inclusive of speech recognition users:

  1. Be consistent with your styles on interactive elements
  2. Use clear text labels on interactive graphic elements
  3. Allow (and code) for guessing on interactive graphic elements

Be consistent with your styles on interactive elements

In order to discern which elements are interactive (links, buttons, etc.) on a page, many users will hover their mouse over an element to see if there’s a state change in their cursor, or tab with their keyboard to see where their focus indicator lands. For speech recognition users, there is no method to directly test if an element is interactive, other than trying to issue a voice command to click or interact with the element. It will be exhausting and frustrating for this type of user to try many different voice commands only to learn that ultimately, the thing they are trying to click isn’t actually interactive.

Good and consistent design for interactive elements will aid all users, but especially speech recognition users. Be consistent with how links look on your site, and limit the number of styles of buttons so that clickable elements are clear to users. For images that are interactive, they should have clearly delineated styles from static images.

Use clear text labels on interactive graphic elements

Clicking on a link that has a clear interactive style, as well as plainly displayed text is very simple for a Dragon user. For example, take the following text-based link:

Black underlined link “Shop all our new markdowns”

In this case, a Dragon user could say “Click ‘Shop All Our New Markdowns’” and the speech recognition technology would find this link easily and click it.

However, what happens when we use a graphic as the base for a link? Unless the icon or graphic is extremely common and well-recognized (such as the magnifying glass icon for searching) a speech recognition user will have few ideas about what to say to activate or interact with this kind of element.

What would a user say as the voice command, for example, if they came across this heart icon which serves as a navigation link on Nike’s website?:

Nike’s main navigation on their website shows their logo, a search area, links to new releases, men, women, kids, and sale, as well as a heart icon and bag icon

They might try:

  • “Click heart”
  • “Click like”

Neither of these commands will work with Dragon speech recognition. The only voice command that will work on Nike’s site is if a user says “Click Favorites.” In the HTML code, this element has been labeled as “Favorites” and as thus, no other command will work to interact with this link.

Code inspector of the heart icon on Nike’s website. The heart icon by a link tag. The link has the aria-label of “Favorites”

Now, let’s take a look at how Uniqlo has designed this same icon in their navigation:

Uniqlo’s main navigation on their website shows their logo, links to women, men, kids, and baby, a search area, as well as a heart icon and bag icon with labels under each saying “wishlist” and “cart.”

On their navigation, Uniqlo has added text labels below each icon. With this addition to the design, the speech recognition user’s guesswork and frustration will be eliminated. The user will know exactly what to say to click on the heart icon on this website: “Click Wishlist.” By labeling the icons, Uniqlo has helped speech recognition users get to their goal faster and with ease.

I recognize, however, that not all designs will be flexible enough to add text labels below every interactive graphical element. Let’s talk about some techniques that can be used when text labels cannot be added to the design.

Allow (and code) for guessing on interactive graphic elements

If a speech recognition user were shopping on REI’s website for clothing, there is often a point in their shopping journey where they will have to select a color option for a product. What might a user say to select one of these color swatches, such as the red option?

REI’s product page showing the product title, rating stars, price, and the color options: red, grey, dark grey, blue, and green.

The user might try:

  • “Click red”
  • “Click dark red”

Neither of those voice commands will work with Dragon because of how REI has coded these color options with specific text alternatives. If a user doesn’t guess “Bordeaux” as the label for the red option, or “Rainstorm” for blue option, they will be out of luck and will have to employ another more complex and time-consuming method to try clicking on these colors.

Code inspector of the red color option on REI’s website. The red color option is an img tag surrounded by a ibutton tag. The img has the alt text of “Color: Bordeaux”

In this case, we should allow for the possibility of a speech recognition user guessing at what the color might be, such as “blue” or “red,” to aid their user experience.

To do this, REI would not have to abandon their unique color naming, but would need to include the potential for logical guesses in their alternative (alt) text for the color option. By changing the alt text in the image to include: “Color: Bordeaux red” instead of just “Color: Bordeaux,” now the speech recognition user could guess “Click red” or “Click dark red” and they would be able to select this option. Additionally, this new alternative text will likely benefit blind users who might not be sure what the color “Rainstorm” is and could now understand that it falls under the category of blue.

Conclusion

By using clear and consistent styles on interactive content, leveraging text labels in conjunction with interactive graphic elements, and building in an option that allows for the user to guess, we can expand our digital interfaces to be inclusive of and better for speech recognition users.

Resources

--

--