Ludic audio and player performance

Part 1: Comprehensive and Indicative sounds

Published in

UX Collective

15 min readJan 14, 2023

An AI-generated stylized image of a blindfolded man wearing headphones and holding a weird gaming controller in his hand — Image generated with Stable Diffusion 2.1

In my post about ludic and narrative sound in games, I defined the ludic function of audio as the sound’s ability to help players overcome challenges and achieve their goals. If you’ve read my blog, you might know that I’m really into this topic. That’s why I’m super excited to start a series of three(?) articles on how audio can enhance player performance in gameplay tasks. In the first part, I describe a few methods to support the players in challenging gameplay situations.

Disclaimer: This article might seem like it’s trying to apply Perceptual load theory to video game audio. This idea crossed my mind when I started writing, but I soon realized I do not have the expertise to reconcile all the confusing and conflicting research I found. So even though I borrow the basic concepts from the theory (such as cognitive load and perceptual load), most of my conclusions are based on my own experience and a few studies I mention in the text. As usual, I try to be open about my thought process, so if anything here rubs you the wrong way, don’t be afraid to challenge me in the comments or elsewhere.

To tackle the complex topic of using sound to improve player performance, I started by thinking about the types of gameplay challenges that players face in video games. At some point, I realized that most of these challenges fall into three categories: cognitive, perceptual, and mixed.

Cognitive challenges involve learning, planning, and decision-making. They include solving puzzles, learning and mastering game mechanics, and coming up with a winning strategy.
Perceptual challenges involve processing sensory information and reacting to it. The most common examples are reacting to audiovisual information and tracking or differentiating multiple objects.
Mixed challenges are a combination of both. They typically involve making quick decisions while processing sensory information.

Turn-based games like chess, Civilization, or X-Com present mostly cognitive challenges. Old-school arcade games like Space Invaders and Asteroids and visual puzzles like Where’s Wally? challenge the players perceptually. Many challenges we encounter in FPS, RTS, and Action RPGs are a mix of both.

Cognitive challenges generate cognitive load, which is basically the mental effort required to process information, make decisions, and take action to complete a task or achieve a goal. There are a few different definitions of cognitive load in psychology, but let’s go with this one. Let’s also assume that at this stage of processing the brain has collected the information from all our senses and translated it into its own language. So cognitive load is not specific to any particular type of sensory input.

Perceptual challenges create perceptual load — the number of attentional resources needed to process sensory information. Perceptual load is specific to a particular sense, and when I use the term in this article, I’m usually talking about (surprise!) visual perceptual load. First, there’s still some debate about whether the concept of perceptual load applies to hearing. Second, high auditory perceptual load in a video game would usually come from a cluttered mix, which most audio designers know how to handle.

AI-generated fake screenshot of a Diablo-style action RPG game — Action RPGs challenge players both cognitively and perceptually. Image generated with Stable Diffusion 2.1

The levels of loads the player experiences will usually change throughout the game, and the range of these changes can vary depending on the genre. Open-world action adventures alternate between demanding and relaxing moments, while competitive multiplayer games might have less variation. The levels of load will also depend on external factors. For example, an esports player might experience a higher cognitive load than a casual player playing the same game, because they have more at stake and are trying to process more information. Someone playing on a mobile device might have a different perceptual load than someone playing the same game on a TV.

As usual, when I say sound, I mean the way we perceive a soundwave, not a soundwave itself. It is a higher-level concept than an audio file or a sound event in audio middleware. Sound is subjective. The same sound event produces different sounds in different contexts, while different audio files can produce the same sound in similar contexts. More on this in a separate post. To effectively use game sounds in the context of challenges and load, I suggest dividing them into two types based on the sufficiency of information they provide to the player.

Comprehensive and indicative sounds

Comprehensive sounds deliver specific, gameplay-related information on their own. Indicative sounds direct the player’s attention to a visual element that provides more information. In other words, comprehensive sounds tell you something, and indicative sounds tell you to look at something. There is a very fine line between the two categories, and a sound can fall into either of them depending on small contextual factors. Let’s think of some imaginary action RPG and pick some examples from it:

1. Quest progression sounds

As the aspiring Chosen One, you need to kill 20 goblins. Each time one dies by your hand, you hear a sound indicating that you made progress in this questionable quest. However, the sound doesn’t tell you exactly how much progress you made, and to get that information you have to look at the counter on the screen. So, the sound you heard is indicative.

Once you’ve killed the 20th goblin, you hear a different sound signaling that your current objective is complete, and the remaining goblins can go on with their lives. Although the screen shows visual confirmation, the sound itself delivers enough information for the player to understand what happened. This sound is comprehensive. But it could also be indicative if the game would let you track multiple objectives at once, and you wouldn’t know which of them was completed with the goblin’s death.

2. Health status sounds

Each time you take damage, you hear a sound indicating that your character got hurt. If the game has a health bar or some other visual indicator of your current health status, the damage sound is indicative: you hear it and then visually check your health. If the game has a health regeneration mechanic with no visual indication of current status (this is probably a bad choice for an action RPG), the damage sound is comprehensive, because the game does not provide any further information on your health.

Imagine you’ve been in a tough fight and have less than 25% health left. The game applies a low-pass filter on some mixer buses to communicate this. Such change in the mix is a comprehensive sound: even if you don’t know exactly what health level triggered it, you have the means to figure it out. This sound remains comprehensive even if the game has no visual representation of the player character’s health. The message changes from “You have less than 25% health left” to “You are in danger”, but the type of the message remains the same because in both cases, hearing the sound alone is enough to understand what is going on.

3. Active ability cooldown sounds

In the last fight, you used an active ability to cast a fireball. The game has a cooldown on active abilities to prevent spamming, but now the cooldown time on the fireball is over, so you can cast it again. The special sound tells you that the ability is available. If the game only lets you equip one ability at a time, the sound you heard is comprehensive: you know you have a fireball, and you know it will fly toward your target when you press the dedicated button. But what if the game lets you equip four active abilities? Then, unless we have different sounds for each slot or ability, the “cooldown is over” sound is indicative because it doesn’t tell you which cooldown is over. However, if you as a player have chosen to only use one ability and know that the other three were available before you cast the fireball, the sound becomes comprehensive again, although just for you and this special case.

An AI-generated image of a wizard casting a fireball — Image generated with Stable Diffusion 2.1

Human attention is scarce, but research suggests that we can extend it by using different sensory channels. Using the classification I propose we can make better choices about which sound type to use in challenging gameplay situations. Comprehensive sounds are great for offloading some visual perceptual load to audition, as they communicate information without requiring us to look away. Indicative sounds, on the other hand, distract us from our current task and push us to reallocate our visual attention resources.

The Perceptual Load Theory proposes that when perceptual load is high, we allocate all our attention resources to task-relevant information and filter out distractions. This makes us perform better because we naturally maintain our focus on the task. But since perceptual load is modality-specific, high visual load doesn’t prevent us from attending to distracting sounds. In fact, evidence suggests that under high visual perceptual load, we are more prone to auditory distractions. This is where indicative sounds become problematic, as they can distract our visual attention even though they exist in the auditory domain.

Let me illustrate this with an example. Imagine you are in the middle of a challenging parkour sequence and you hear a UI notification sound that drags your gaze to the corner of the screen. Looking there, analyzing the message, and refocusing your eyes on your initial target will take a few hundred milliseconds, and likely affect your concentration, increasing your chances of failure. The more sounds like this a game has, the more challenging (in a non-fun way) it becomes.

Why don’t we just make every sound comprehensive? First, that would be costly. For a large game, we’ll need to build an extensive information hierarchy, and provide tons of content to support any kind of message we want to deliver to the player.

Second, every sound the player needs to learn comes with a cognitive cost. Let’s return to our cooldown timer signal example. If we have the same sound for any active ability, learning it is easy. If we have 4 ability slots and make a distinct sound for each slot, players will need a bit of time to learn them to the point when they can reliably understand each message. If we have hundreds of active abilities and decide to make unique sound effects whenever their cooldown timer is over, most players will never learn what exactly we are telegraphing.

And third, instead of delivering an explicit message, we often want to give the player the means to solve the challenge or uncover some information. Indicative sounds are useful to guide the players and enable their agency, letting them discover missing pieces of the message on their own. Imagine a survival horror game. You stand in the corridor of an abandoned house and hear the door opening somewhere nearby. It is an indicative sound that invites us to explore the scene and adds to the sense of suspense. If the sound managed to communicate where exactly the door is and why did it open, it would simply not serve those functions.

AI-generated fake screenshot from a survival horror game, the character is standing in a creepy-looking corridor. — Image generated with Stable Diffusion 2.1

Indicative sounds are instrumental when we need to attract the player’s attention to something important. For instance, in a multiplayer shooter, you want to know when someone is approaching from behind, so you appreciate the footsteps sound informing you to turn the camera even though the sound doesn’t explicitly say who is coming and why.

Practical applications

By now, you may have realized that it’s not always easy to tell comprehensive and indicative sounds apart. Luckily, this classification isn’t really important outside of the context of challenging gameplay, and I don’t recommend trying to classify every sound in your game as one or the other. In my opinion, this method is most useful for refining the existing design when most of the sound content is already in the game, but you can still make significant changes to the soundscape.

Disclaimer: Although the ideas I express here have been backed with some practical experience, I am only shaping and refining them by writing this post. So the guidelines below are rather based on how I think this method should be used, and don’t represent my actual workflow. I might update this article as I try out the method more. And of course, I’m open to any improvement ideas you might have.

Step 1. Identify gameplay scenarios when the player is expected to experience high cognitive load, high visual perceptual load, or both at the same time. If you have trouble doing this (or even if you don’t), ask for help from game and UX designers. These scenarios could be generic (like a boss battle, ranked PVP match, or platforming segment) or specific (like a puzzle at the end of level 5 or the final phase of a team deathmatch when the player’s team is losing and needs to regain the advantage). Note that I’m not mentioning high auditory perceptual load as a relevant scenario because of the reasons I mentioned in the introduction.

Step 2. Play through the identified scenarios and list all the frequent and prominent sounds you hear. To reduce bias, you can ask some colleagues from other job families to list the sounds while you play. You can of course also analyze the profiler session from your middleware of choice, but make sure to exclude any inaudible events. Rank the sounds in the list based on how often they play and how prominent they are in the mix, then rearrange the list in a way that the most frequent and noticeable sounds go to the top.

Step 3. For each sound you select in this step, you’ll need to answer three questions:

What message does the sound carry?
Why does it exist in the game?
When does it play?

Depending on the size of your list, you want to select a limited number of sounds to work with. The higher the ranking, the more impact you can make by working on the sound. But since the highest ranking sounds typically relate to the core mechanics of the game, they likely already work well and don’t need any treatment. This is why I recommend going through the list with the questions in mind and selecting the sounds that don’t immediately generate good enough answers in your head.

Now take some time to actually answer the questions. Make sure you write down your answers! The answers could look like this:

What: The player has delivered a headshot to the enemy
Why: To provide feedback and reward the player for a challenging shot
When: Whenever the game registers that the player has successfully performed a headshot
What: The cat says “Meow”
Why: To enhance immersion by creating an illusion of a lively world with interactable pets
When: Whenever a character interacts with the cat, or when the player is nearby and the cat makes a random noise
What: There are enemies nearby
Why: To alert the player and let them prepare for the fight
When: Whenever there is at least one enemy within a 100m range

Step 4. The answers you have should be enough to label the sounds you selected as either comprehensive or indicative in the given context and to assess their ludic value. Now you are ready for the main part.

High perceptual load scenarios

In high perceptual load scenarios, our main focus should be on the player’s visual attention. The two main strategies for this are offloading some visual information to comprehensive sounds and reducing the number of indicative sounds that distract the player from important things.

First, make sure you haven’t missed any opportunities. Look for information that is only presented visually and consider how important it is to the player and how difficult it would be to communicate it with sound. If the information is important but a comprehensive sound is difficult or impossible, adding an indicative sound is still a lot better than nothing.

Next, see which indicative sounds can easily be turned into comprehensive ones. This may require adding more sound effects and changing the playback rules. While it might be tempting to try to convert all indicative sounds, many of them are actually important for directing the player’s attention to important things. The answers to the “Why” question should help you decide which sounds to focus on.

Finally, suppress the least valuable indicative sounds in the mix. Sounds with neglectable or negative ludic value can be muted until the high load scenario is over. The low ludic value sounds can become quieter for the same period of time.

High cognitive load scenarios

I think the best way to assist the players during high cognitive load scenarios is to prepare them for such scenarios. Comprehensive sounds are only effective when the players know what they mean. Some sounds are intuitive enough that players understand their message after hearing them once. These sounds often communicate simple messages or use clever metaphors. But not every message can be conveyed using metaphors alone, so players often need to learn the meanings of sounds as they explore gameplay systems. Our goal is to make this learning process as easy as possible.

One way to do this is to ensure consistency in the design language of sounds. For example, you can add an indicative element to comprehensive sounds that communicate similar things. Feedback sounds for drinking a health potion, a mana potion, and an antidote can share the same layer to communicate the high-level message (“drink a potion”) mixed with distinct layers for each potion type. This way, players will first learn the shared part and later learn the distinctions. When they encounter a completely new potion, they will easily learn the new sound because they already understand the sonic language of the game.

Remember that players don’t want to learn sounds for their own sake; they want to understand mechanics and gameplay system behavior. To support their learning, use clear, reliable, and predictable signs and feedback. Keep in mind that people have a limited learning capacity, so it’s often best to reduce the number of comprehensive sounds in favor of indicative sounds that communicate high-level information. It’s also important to remove distractions by reducing low ludic value sounds, so the advice from the previous section holds true here as well.

In addition to learning, we should look for ways to save the player’s short-term memory resources. Let’s again revisit the cooldown timer signal example from above. When such sound is comprehensive, the experience somewhat resembles microwaving your food. A microwave oven enables you to put food inside and forget about it until you hear a signal. In the same way, the sound lets you use the ability and forget about it until you hear a signal that you can use it again. The indicative sound would do a decent job here as well, although you’d have to verify that the signal you hear relates to the right thing each time you hear it. When there is no sound at all, the experience is closer to heating your food on the stove burner: you need to keep in mind that it is there and check it from time to time, which is a bit annoying if you also have to fight skeleton hordes at the same time.

An AI-generated fake game screenshot showing a player character looking at hordes of skeletons near a kitchen stove — Hordes of skeletons approach the kitchen stove. Image generated with Stable Diffusion 2.1

Finding such opportunities is not easy. If you read my previous post, you probably remember that sound is very effective in communicating about time and change over time. This is why I’d personally start by checking gameplay systems that use timers and apply the same logic. But this is just a starting point, and I’m genuinely interested in other examples.

High perceptual and cognitive load scenarios

Many games present mixed challenges, which create both kinds of mental load at the same time. I think this is a very common case and I would approach it by combining all the methods listed above, leaning towards those that seem more appropriate.

Note that in both cases we benefit from reducing distracting sounds. Mixing indicative and comprehensive layers as described in the previous section is equally helpful here because it gives the player choice to extract only necessary information based on the current goal. Other than that, the key to success is finding the right balance between comprehensive and indicative sounds and making clever compromises. This cheat sheet should help:

Comprehensive sounds:

+ When learned, they reduce (or at least do not increase) the visual perceptual load
+ Useful for offloading some information from the short-term memory
+ Improve accessibility
- Cognitively demanding to learn and may be confusing until the player learns them
- May remove meaningful challenge and reduce player agency
- More costly and difficult to produce

Indicative sounds:

+ Efficient, as the same sound is shared across multiple messages
+ Engage and motivate players by attracting their attention to contextually important things
+ Particularly useful to attract attention to changes in surroundings
- Increase perceptual load, attracting the player’s attention to the objects on the screen
- May distract the player in challenging situations.

It’s important not to forget that the goal is not to remove challenge, but rather to reduce non-fun elements and promote player agency. Sometimes we help players deal with less exciting challenges so they can focus on more fascinating elements. In other cases, hints and subtle messages can be better than explicit communication. I hope this article helped you navigate such scenarios and make better design choices. Please reach out if you have any questions or feedback! In part 2, we’ll explore the practical ways to use the efficiency of human auditory processing in game audio design. Stay tuned!