Building inclusive Mixed Reality: don’t go it alone

“We don’t have to do all of it alone. We were never meant to.” ― Brené Brown

Arathi Sethumadhavan
UX Collective

--

Co-Authored with Joe Garvin

A hybrid of augmented reality and virtual reality, mixed reality (MR) blends virtual elements with the real world. MR devices scan your surroundings and create a 3D map, and as you move around, you interact with objects using gestures and your voice. Driving this immersive experience is a combination of cutting-edge technologies including sensors, optics, and next-generation computing power.

For us in the tech industry, building MR experiences has us on the vanguard of computing and engineering. And as we build this new technology, we’re not only creating novel and delightful experiences, we’re also on the frontier of human social interaction, reshaping the way people communicate. With such great potential impact, how can we remain focused on inclusion as we face complex challenges? How can we make sure we leave no one out?

As technology creators, we’re on the leading edge of reshaping social rules and norms in the new world. How can we create more inclusive AI experiences? When dealing with complex and ambiguous problem spaces, we’re often daunted by responsibility and may not always know where to begin. The answer is simple: we shouldn’t do this alone. Instead, we should regularly look sideways to other fields for inspiration and help, because accommodating the full spectrum of our global audience is an incredibly complicated undertaking. Acknowledging the limitations of any single subject area, we should stay curious and creative. Let’s think expansively and imaginatively about how other disciplines can help us build inclusive MR.

Hand-tracking

Hand gestures and hand-held controllers are commonly used to traverse virtual menus, issue commands, and more in MR devices. The device tracks one’s hand motions, recognizes the gesture, and performs the requested action. For hand-tracking to work, the controllers must first physically fit the users’ hands well. The set of gestures must also be easy for everyone to perform, and the AI must recognize many different types of hands to track and interpret gestures correctly. That’s tough to pull off, since human hands differ by quite a few factors: shape, skin tone, tattoos, and medical conditions or injuries, to name a few. All considered, it’s an exceedingly difficult task for a team of even the best UX researchers and engineers to tackle solo.

Fortunately, professionals in relevant disciplines can offer us crucial help. Ergonomic experts, for example, specialize in anthropometry, the study of the measurements and proportions of the human body. Anthropometry offers us robust databases of human body measurements in addition to providing a reliable correlation between hand size and an easily obtainable measure, height. By referring to these databases, a diverse dataset can be constructed.

Ergonomic experts aren’t the only ones who could help us improve hand-tracking for MR. Specialists in the realm of health care can also provide necessary insight into scenarios we would need to consider in order to craft truly inclusive hand controllers: orthopedic hand specialists could advise us on medical conditions affecting fine motor skills, such as arthritis and tendonitis, while physical and occupational therapists could help us accommodate people who have burn injuries or use prosthetic limbs. Such expertise serves as input to the definition of preliminary requirements for datasets. Hand-tracking is important because it enables people to make gestures to interact in MR. These subject area experts can help us make hand-tracking work for as many people as possible.

Eye-tracking

Eye-tracking with appropriate privacy measures is another integral component of MR. Embedded in a headset or pair of glasses, tiny video cameras pointed closely at your eyes find your pupils and track how your eyes behave. Vision AI processes the video in real-time to learn what grabs your attention, as well as how you’re responding to what you see. In fact, patterns of our eye motions can reveal quite a lot to computers about how we feel, how we think, and what we know. For instance, looking at something immediately and staring intently suggests sustained interest, and quickly looking away might indicate boredom or disgust. In addition to gaze, the size of the pupil and its rate of change also tells the story of a person’s inner emotional and cognitive states.

MR of the future will rely heavily upon eye-tracking. Another promising application of eye-tracking is to provide high resolution only where you’re looking (similar to how the human retina works), which would greatly reduce the size, cost, and computing power required for MR devices. MR devices could also detect where you’re looking and show the most relevant UI elements based on what you’re focusing on.

Human eyes vary by iris color, eye shape, and eyelid type. Some folks rely on eyeglasses or contact lenses, and some people have common medical conditions like lazy eye. Maybe someone has red and puffy eyes from seasonal allergies, or is wearing eyeliner makeup that day. All these factors, if overlooked during product development, might interfere with tracking someone’s eye movements. Here, as with hands, experts in relevant fields provide invaluable guidance for designing tracking technology that works well for all.

With their specialized knowledge, experts in eye-related fields can help researchers define requirements and design datasets. For example, ophthalmologists can advice about medical conditions and population level perspectives on eye features such as color and optometrists can provide input on eyeglasses and contact lenses. We can leverage such expertise to devise informed and thoughtful data-collection strategies — the foundation of inclusive MR.

Voice AI

Complementing hand-tracking and eye-tracking is a third mode of interaction in MR: automatic speech recognition. Picture yourself headset on, controllers in hand, making gestures, looking around, and yes, issuing commands or dictating text with your voice. Using your voice to communicate your intent is extremely natural, and this is part of voice input’s huge potential in MR. It allows you to do things without having to use gestures, lets you cut through nested menus with one command, and speeds up the process of entering text.

It’s seamless when it works well, but the process is far from simple. AI models listen to what you say, analyze it based on the thousands of hours of voice recordings they’ve been trained on, and make an interpretation. Ideally, voice AI hears and understands speakers of every accent and dialect, but AI is only as fluent as we’ve trained it to be. Human speech is profoundly multi-faceted, both expressing and creating culture, history, and social identity. And in addition to the many distinct language variations that exist, we must also consider non-native speakers, people with speech impediments or medical conditions, and other “non-standard” ways of speaking.

Mapping out the layered landscape of human speech is a job beyond the technical scope of engineering or computer science. So where might we look for the insights we’ll need to collect the right kind of speech data? We might start with a sociolinguist who can shed light on different speaking patterns at play. A data scientist or demographer could help determine how many people in which groups need to be sampled. By teaming up with professionals in these relevant disciplines, we’re able to capture a fuller, more accurate picture of the incredible variety of human speech.

Letting people be themselves

In addition to creating inclusive multi-modal experiences, we must also allow people to express themselves and their identities in MR. Think of your MR persona as your digital twin: if you were sending your mixed-reality-self to a digital trade show as part of your job, for example, you’d want your avatar to look like you. Or perhaps you’re someone who enjoys adopting an alternative virtual identity. In any case, the product team should strive to maximize the number of identity options in MR, so everyone can be themselves.

Consider, for instance, how MR must offer people a good variety of hairstyles to choose from. Hair is a crucial aspect of self-expression and identity, so it’s important we get this right. But as we begin to work, we need to acknowledge that there exists a huge variety of hairstyles across the globe, and there’s no comprehensive hairstyle database for us to refer to and no clear standards for hairstyle classification. Here’s an occasion where we might look over to the realm of video games and apps, studying the avatar options in both. We might also even look to biological anthropology or the beauty and cosmetology industries. When we’re inquisitive and open-minded, no field is off-limits.

Creating inclusive MR takes time and intention

Whether it’s ways of interacting or ways of expressing yourself, the inclusive MR of tomorrow starts with the carefully crafted datasets of today. But we know building such complex new technology for a global audience is an extraordinarily difficult undertaking, and no single discipline could possibly provide the knowledge and nuance required for the job. That’s why we in the tech industry shouldn’t build MR alone.

First, we look sideways to other domains, bodies of knowledge, and experts to create our data collection strategy. Next, we balance these analytical methods with working with actual users of the system and listening to their feedback. Because nothing can substitute for genuinely engaging with the real people who will use the technology.

This planful, multi-disciplinary approach requires extra forethought, and it requires extra time and resources, too. But we must be willing to make this investment today. It’ll pay dividends in the future, in the form of genuinely inclusive MR and higher user satisfaction.

Acknowledgments

The authors would like to thank Marta Wilczkowiak, Flavia Amaral, Magdalena Vukosavljevic, and Ben Noah, for their valuable suggestions on this manuscript.

Additional Resources

Inclusive Design

o An introduction to the world of inclusive design (MSFT Inclusive Design manual/toolkit)

o How does VR fare in terms of accessibility and inclusion? (UX Collective)

o Designing inclusive virtual reality experiences (Springer)

Eye-tracking

o What does your gaze reveal about you? The privacy implications of eye tracking (Springer)

o Meta wants to track people’s facial expressions in metaverse (nypost.com)

o Eye-tracking tech is another reason the metaverse will suck (Vice Motherboard)

Collaborating with Domain Experts

o AI safety needs social scientists (OpenAI)

o Domain expertise: The key ingredient for successful AI deployment (IndiaAI)

o How to staff an AI team: 11 key roles (The Enterprisers Project)

o Ergonomic guidelines for VR/AR interactions (NIU researchers)

--

--

Head of User Research, Ethics & Society, Microsoft I Fellow, World Economic Forum