Daring more raw AI-ness, a critique of personified voice assistants

Alexa Steinbrück
UX Collective
Published in
8 min readJan 9, 2022

--

Illustration by Tamara Siewert, edited with https://constraint.systems/

Personality and gender in voice assistants is not something that emerges due to the nature of “AI systems”. Rather, it is intentionally created based on the logic of market demand, gender biases, and prevalent unrealistic narratives about AI. Raw AI-ness could mean: Actively deconstructing narratives that parallel artificial intelligence with human intelligence. Not trying to blur the borders between humans and machines, but instead highlighting the distinct capabilities and limitations of machine learning backed technology.

In 2019 I got the chance of being interviewed by design writer and editor Madeleine Morley (“Eye on Design”) for an article about alternative voice assistants. You can read the full article “What would a feminist Alexa look, or rather sound like?” here! The article has also been published on Adobe XD’s blog.

MM: Why do we feel the need to personify technology?

AS: I think there are two sides: the side of the users and the side of the companies that build and design the technology. There is a feedback loop between these two parties: Users come with their expectations, which are often influenced by media and science fiction. Companies are guided by market demand but they also create new expectations in the way they market their innovations.

Anthropomorphism, thus ascribing human-like qualities to non-human entities, is an innate human trait, across centuries and cultures. We’re totally ok with projecting intentionality and feelings on geometrical shapes like triangles that move in a certain way across a screen. We’re giving nicknames to our cars and stuffed animals. Now chatbots and voice assistants are a bit different because they are capable of reacting to our natural language commands and their output is natural language too. They trigger anthropomorphism to a much higher degree because language is probably the most defining traits of us humans. Our brains are just not used to interact with a speaking voice detached from a human being.

In the field of Human-Computer-Interaction (HCI) many researchers see anthropomorphism and personification as a successful design technique, a mechanism that helps to enable the interaction between humans and machines. In the early development of voice assistants however, personification hasn’t been that important. The early versions of Siri, for example, were pretty utilitarian user experiences. The designers didn’t see a point in creating a pseudo-humanity or endowing the character with humour and playfulness. Then they gave it a chance and started experimenting with it. When Siri finally came out its pseudo-humanity was the feature that was valued the most by its users.

Users ask their voice assistant all kinds of questions that would imply being human. It’s very common for people to poke into this kind of technology, to see where its boundaries are.

The companies behind voice assistants have dedicated personality teams which consist of writers who have backgrounds in screenwriting, acting, psychology or linguistics. They are tasked to write entertaining and believable answers to the questions or statements that users are repeating regularly. It’s a data-driven back and forth between writers and users.

When the makers of voice assistants give a name to their gadgets (Alexa, Siri, Cortana) and marketers prompt you to “get to know Alexa” they’re essentially creating affordances for the user to anthropomorphize their devices.

A 2017 study (link) found out that users who anthropomorphized their voice assistants were also more satisfied with the product. This begs the question of causality: “Does satisfaction with the device lead to personification, or are people who personify the device more likely to be satisfied with its performance?”

I want to highlight a second angle to explaining the personification of voice assistants: These gadgets are different to other objects like cars or stuffed animals because they’re part of the realm of “intelligent” technology and voice assistants are marketed as “an AI”. When you ask Alexa “Who are you?”, Alexa will tell you “I am an Artificial Intelligence”.

There is a mismatch between the expectations that the public has about AI technology and the actual state of development in the field. There’s a multitude of reasons: inaccurate narratives about AI, overpromising researchers, inaccurate language about AI in the news.

Most importantly, many people aren’t aware of the difference between narrow and strong AI. But this distinction is crucial! Narrow AI is the main focus of current AI research and it has immediate real-world applications: Systems that excel in recognizing patterns and making predictions in a very narrow domain. They don’t possess the capability of generalisation or fluid intelligence as we humans do. Think of a human Go player who has always played on a 10x10 board. When faced with an 11x11 board the person will be able to generalize and apply what she’s learned to this slightly different environment. A program based on machine learning instead will be lost and would have to be trained again for this specific new environment.

The goal of strong AI (or Artificial General Intelligence, AGI) instead, is to create systems with human-level intelligence or beyond (the “Singularity”) and are able to reason across a multitude of domains. Hard fact: Strong AI is science fiction and not a reality.

Voice assistants are an example of very sophisticated narrow AI, but they’re marketed as strong AI. Every trait that makes it appear as strong AI — the character, the opinions and desires, the apparent self-awareness — is scripted by a team of writers who have the task of creating an entertaining pseudo-humanity that users want to engage with. By personifying voice assistants and marketing them as “an AI”, the companies are leveraging people’s unrealistic expectations of what contemporary AI is capable of.

My position in this debate whether personification is a good thing: I see the advantages of using personification as a design technique, but I also argue that there are significant downsides: Harmful gender stereotypes, a negative impact on our AI literacy (by selling us wrong narratives about the actual capabilities of AI) and the shallowness of the constructed personalities.

[Note about terminology: I am currently using the term “Anthropomorphism” for the innate human capacity to project human characteristics on non-human things. And the term “Personification” for a design strategy that endows objects and technology with human characteristics so that it stimulates and facilitates anthropomorphism in its users. I’m not sure if there’s such a clear separation between them though.]

MM: Why are we gendering technology?

AS: From the perspective of the makers of this technology, the concept of believability is very important. UX research on conversational agents (like chatbots and voice assistants) has shown that believability/credibility is linked to feminine gender personification. Why is that?

My point of view is that this is due to gender stereotypes that assign a woman certain characteristics which are also deemed beneficial for the role of an assistant: a lower status level, an obliging and polite personality, a certain degree of servility, etc. Therefore a female-gendered voice assistant matches the expectations and assumptions that a user might already have, be it openly or subconsciously. Choosing not to meet these expectations would create a little bit of confusion and friction in the user experience. And what these companies want is the user experience to be as frictionless as possible. The problem is that they are setting standards this way and enforce these stupid stereotypes even more. If they could instead use their power and market dominance to challenge these old stereotypes and establish new standards and expectations not based on gender stereotypes…

Another aspect is trust and the challenge of convincing people to adopt a new type of technology. Apparently, a female voice and personality make new privacy-invading tech appear a lot less intimidating…

MM: I’m wondering if you’ve come across any other non-human personas as voice assistants, or how you’ve explored this idea of “raw AI-ness” in your own work and workshops?

AS: I realised that the makers of voice assistants have a contradictory position towards personification: Amazon says they don’t have “an explicit desire for customers to anthropomorphize more or less than they do” (source), but they create affordances for anthropomorphizing: They give it a female name, they endow it with thousands of preferences and opinions, a detailed backstory, and they make it speak in the first person pronoun. Ask a voice assistant if it is human it will heavily negate, but most of them will tell you about preferences and hobbies for which you would have to be a human or have a body at least, like doing somersaults. Hundreds of engineers work on making synthetic voices sound as human as possible (just look at Google Duplex).

I am perceiving the personification and pseudo-humanity as narratives that appear as additional layers wrapped around the technology. The question is: Do we need these narratives and layers? I see a parallel to Skeuomorphism: it’s not only hiding the underlying mechanisms like a black box but it’s even wrapping it in representations that mimic a different mechanism. In the case of voice assistants, it’s mimicking a real human being.

Raw AI-ness could mean: Actively deconstructing narratives that parallel artificial intelligence with human intelligence. Not trying to blur the borders between humans and machines, but instead highlighting the distinct capabilities and limitations of machine learning backed technology. Avoiding the first person pronoun in the language output. Developing a distinct new sound for synthetic voices that distinguishes them from human ones (but sounds pleasant).

In my workshops I explain how voice assistants work, the internal information processing pipeline from input to output, where and which AI techniques are applied exactly. I also show the participants where no AI is used and where actual human beings, writers, invent the responses of a voice assistant. This demystification part is important because we can see that the gendered pseudo-humanity of a voice assistant is not a rule of physics, but it is a conscious design decision which is refutable. Personality and gender in voice assistants is not something that emerges due to the nature of “AI systems”. It is intentionally created based on the logic of market demand, gender biases, and prevalent unrealistic narratives about AI.

Once we have stripped down the narrative, I prompt the participants to think of other alternative ways of representations of voice assistants. Exciting ideas emerge from that. At the workshop in London, at Mozfest, a team came up with the idea of a mountain, which was really beautiful. You might say: But that’s also not raw AI because it’s also a narrative wrapped around the real technology! But I think this narrative is different, because it’s enough far away from the truth, it’s conscious and playful animism. We know that we’re not speaking to a real mountain, but we can imagine doing so.

What I’m hoping for is: The more we integrate AI technology in our everyday life, object recognition in our apps, and speech recognition in our phones, and the general public gains more knowledge of the field, personification will become a distraction, a weird bag of cheap tricks, we eventually get rid of. I just saw that Google added a section about Machine learning to its Material Design guidelines (Link). They focus on apps that use object recognition technology. Object recognition and computer vision are core fields of AI. Just the fact that the app has the capacity of “seeing” by recognizing objects in images doesn’t make it “an AI” that needs to have its own name and personality. We’re able to see it as a technical extension of a specific sense we have, but not as its own entity.

--

--

A mix of Frontend Development, Machine Learning, Musings about Creative AI and more