Spatial audio and player performance

The positive and negative effects of audio spatialization.

Denis Zlobin
UX Collective

--

A balck-and-white AI-generated image of a man sitting inside an abstract sphere of speakers and sound sources
Image generated with the Image Creator from Microsoft Bing

Earlier this year, I wrote two articles on how to intentionally sound design a game to help players deal with in-game challenges. As some pointed out, I ignored the elephant in the room — the recent advancements in spatial audio that enable players to localize sounds with greater precision.

Back then, I decided to skip this topic because of the somewhat conflicting evidence I kept stumbling upon during my literature review. Some sources claimed that spatial audio helps players perform better, while others warned that it introduces distractions that can degrade the experience. Recently, I had a few fruitful discussions with fellow game audio folks who inspired me to take a deeper look at the literature and write this post.

Ludic effects of spatial audio

Spatial audio undeniably enhances sensory immersion and improves the ease and accuracy of localizing sound sources in space. Consequently, its ability to provide more precise information about the sound source location impacts player behavior and performance. Personalized HRTFs improve athlete reaction times in eSports. Better spatial accuracy helps players achieve their goals in time-sensitive navigation tasks. Spatialization can also lower the perceived threat of the enemies on screen, making the player more confident about their location, which might be either a desired or an undesired effect depending on the game.

On the other hand, there is strong evidence that increased spatial audio accuracy can damage the performance in selective attention tasks, distracting the audience and disrupting their focus on the current visual goal. Spatialized sounds coming from outside of the field of view or screen space capture our attention more easily, creating an additional, sometimes unnecessary, perceptual challenge.

If you followed the links, you have probably already identified the pattern. In studies that highlight the positive impact of spatial audio on player performance, spatialization mostly applies to the sounds that relate to the player’s goals. For example, an enemy location in a first-person shooter will always be important to the player who wants to defeat the enemies and survive. The studies that show negative impact, on the contrary, have spatialized distractors — sounds that carry no useful information for whatever the participant is doing. This easily inspires a conclusion that positive and negative effects of spatialization depend on what sounds we choose to spatialize and whether the information they carry is useful to the player.

Allow me a little digression to better explain what I mean. Experienced product sound designers know that the effective user interface sound is the one that focuses on the essential message, carrying as little redundant information as possible. Let’s imagine the process of designing the UI sound of unlocking a mobile device.

One of the first ideas that typically come to mind is something like “Let’s record unlocking a physical padlock!”. But if we use such sound as it is, it will likely not perform well and turn out to be annoying after a few repetitions. The source of annoyance in this case is the amount of unnecessary information that the sound carries. Apart from signifying the act of unlocking, the unedited sound informs us about the materials of the padlock and the key, the structure of the mechanism, the force we applied, and the space where the recording has happened. All this information is unnecessary for the user who unlocks a mobile device, and if we force them to process it every time they engage with their gadget, they become reasonably tired of it. This is why to become effective, the sound needs to be “devoid” of most of its context¹. This is also why a user interface designer will never use a photo or a highly detailed image as an icon.

An AI-generated image showing someone holding presumably a microphone over something mildly disturbing blending into their hand.
I prompted the Adobe Firefly image generator with “Let’s record the sound of unlocking a physical padlock!”, and it gave me something moderately disturbng. I had to share it with you.

As I said above, a highly spatialized sound provides more precise information about the location of its source. Essentially, this is an additional layer of information the player’s brain needs to process. Depending on the player’s goals, this information might be either useful or redundant, leading to opposite effects. This brings me to a cautious conclusion that spatialization improves player’s performance when spatial information is relevant to the player’s goal but causes an opposite effect in the other cases.

Disclaimer: Note that I focus on the ludic function of game audio here, omitting the narrative function that might be more important to your project. In many games decreased player performance could be a reasonable tradeoff for increased immersion and presence that spatial audio enables. For instance, in a narrative-driven horror game, we’d rather focus on eliciting a certain emotional state than helping the player efficiently avoid or defeat the monsters.

Dynamic spatialization?

If my conclusion holds true, we might need to develop some ways to dynamically control sound spatialization in games with greater precision and granularity. I find it useful to view spatialization (meaning the ease and accuracy of locating a sound source in space) as a distinct perceptual property of a sound — just like loudness or pitch. The loudness analogy is especially illustrative. A loud sound that relates to the player’s goal (such as an explosion nearby) is helpful because it is easy to hear. If a loud sound is irrelevant to what we are doing (think of a repetitive loud bird call) it is disruptive because it is difficult to ignore. Audio designers intuitively make such observations to inform their choices when pre-mixing the audio files and mixing the game. Why don’t we project the same approach to spatialization? Currently I see two potential obstacles for doing so.

First, in many discussions, people still compare “spatial” audio with “non-spatial”, like it is a binary choice between two paradigms. It doesn’t have to be! Yes, not all spatialization techniques and technologies are compatible with each other, and some provide very little control over spatialization. But games that use the 3D audio pipeline normally have a mix of spatialized and non-spatialized sounds, and some of them have a varied degree of spatialization (like different orders of ambisonics) depending on the sound type. On top of that, some sounds are naturally easier for us to localize because of their timbral structure. From the perceptual perspective, spatialization is already a spectrum (meaning, we localize some sounds better than the others), and we should somehow adapt our vocabulary to reflect this.

Second, if spatialization is a perceptual property of a sound, we’d better learn to measure it and agree on standard units to describe it. Audio engines already provide us with tools to affect spatialization, but we lack the shared measure of how accurately localizable a sound can be relative to the other sounds — something we can universally track and maybe even standardize for different sound types. The introduction of such measure, along with the further evolution of tools, potentially enables a whole new exciting dimension in mixing interactive audio experiences.

Eventually, I’d like to see us building dynamic systems to change spatialization of groups of sounds based on the information the player needs in a specific gameplay situation. Many modern multiplayer shooters have a system that makes enemy gunshots louder the closer they aim at the player. Is there any reason for such sounds to not deliver a greater spatial accuracy compared to the rest of the combat soundscape? Conceptually, many dynamic mixing techniques we use to control loudness of sounds, could be transposed to spatialization. And I think we can easily get there after we solve the two problems I mention above.

As usual, feel free to share this post if you agree with my conclusions and don’t hesitate to challenge me, — here or anywhere you encounter me online, — if you disagree. I’m also always happy to exchange ideas on game audio UX and functional sound design, and the easiest way to reach me is to message me on LinkedIn. Thank you for reading!

¹This is not necessarily true for video game UI sounds that often have a narrative function.

--

--

Functional Audio and Audio UX in video games. I write articles about game audio design that teach you nothing about DAWs, plugins, game engines, and middleware.