Misguided by research — The two dimensions of SUS

The System Usability Scale is largely considered unidimensional. Lewis and Sauro performed factor analysis to uncover a “hidden” dimension; learnability.

7 min readJun 5, 2022

Mountains reflected on a lake; resembling the “two dimensions” — Photo by José M. Reyes on Unsplash

A “ quick and dirty ‘’ unidimensional usability scale was the description of the System Usability Scale (SUS) when first introduced in 1986 by John Brooke. Since then it has been used and validated numerous times [1]. A lot of variations of the SUS have been suggested: simple linguistic changes, internationalised versions, positive worded questions; the list can go on for a long time.

Being on its introduction the System Usability Scale was largely considered unidimensional — measuring perceived usability of a system. Indeed, Philip Kortum and Claudia Z. Acemyan showcased brilliantly that SUS can result in very low scores (< 20) to very high (> 90) [2], covering the whole spectrum of perceived usability in simple systems.

[EDIT START]

Some time ago I stumbled upon a publication regarding a second dimension of the SUS guided by research done by James R. Lewis and Jeff Sauro in 2009. The referenced paper [3] has been cited 1298 times (to date), and [I being] fascinated by the results and demonstrations I began using that second dimension in reports about usability. Couple of days ago I decided to write this article to spread the word for this dimension that I didn’t see being used in the wild — we are all trying to make the most out of our research anyway. It wasn’t a couple of hours after the publication of this article that Lawton Pybus commented, linking to a blog post by the paper’s author Jeff Sauro that was referring to a publication done by James R. Lewis and Jeff Sauro in 2017 [5] recalling the use of the second dimension.

In a nutshell, Lewis and Sauro analysed 9000 SUS questionnaires in depth, to find that SUS is indeed two-dimensional — with the two dimensions being negatively worded questions and positively worded questions [5], a factorisation of no real use. During the analysis, questions 4 and 10 were continuing to contribute more than the others to the negative worded questions; but not enough to form a dimension by themselves. This suggests that in some research contexts the “learnability” dimension might occur [5].

This serves as a warning to intercept the rest of the article as an obsolete method. If you decide to use it would be driven by data demonstrating the two dimensions have been shown to reliably occur in your research context.

[5] Lewis, James Jim R., and Jeff Sauro. “Revisiting the Factor Structure of the System Usability Scale.” Journal of Usability Studies 12.4 (2017).

[EDIT END]

However, time not only validated the SUS but also showed one hidden insight. James R. Lewis and Jeff Sauro gathered data from handing out and analysing 324 questionnaires; they correlated their findings with data (2324 questionnaires) given by Bangor et al. [3].

The two dimensions of the SUS

Lewis and Sauro performed factor analysis in the data generated from the total of 2648 questionnaires. Eight questions converged to factor Q1 and two to factor Q2. Based on the content of the questions, Q1 was named “Usable” and Q2 “Learnable”.

The Usability dimension

The usable dimension consists of questions 1, 2, 3, 5, 6, 7, 8, and 9. Essentially, this dimension is statistically equivalent to the overall SUS score. (How equivalent? r = .985, a = .91). Theoretically, if we hand out this modified 8 questions SUS, we will get the same results — with the same reliability — as if we handed out the 10 original questions.

The SUS questionnaire with the questions 1, 2, 3, 5, 6, 7, 8, 9 circled in red.

The Learnability dimension

Questions 4 and 10 are contributing to the same dimension, which is separate from the ”usability” previously described. What that reveals is that these two questions are providing an extra insight to the overall usability aspect of the system — namely they are measuring Learnability. Of course, this new dimension is not completely uncorrelated with the overall SUS score (10 items questionnaire), as it is included in it (r = .784). But they are independent enough, from the “usability” scale, to provide a new insight (r = .664, r2 = 44%).

A question that might arise is: How reliable a metric with two questions can be? That is addressed by the authors by calculating the coefficient alpha (Cronbach’s alpha) of the new dimension, which was concluded to have sufficient reliability (a = .70).

An interesting note is that question 7 — I would imagine that most people would learn to use this system very quickly— which is mostly attributed to measuring some learnable aspect of the system, doesn’t contribute to the “learnability” dimension.

The SUS questionnaire with questions 4 and 10 circled in blue.

How to calculate the Learnable scale

Add the scores of questions 4 and 10
Multiply the result by 12.5
You will get a number between 0 and 100, which corresponds to the learnability of the system

Hands-on example

We now know that we can extract two measurements from the classic SUS:

How usable the system is
How learnable the system is

Let’s walk through an example of how to acquire these, using data from a recent SUS study with 110 participants. For ease, we have selected a random subset of the original data, with a sample size of 12. This should conclude on the same results as the original experiment; a sample size of 12 has an accuracy of almost 100% [1].

Gathering the responses

The 12 participants answered the SUS using Google Forms and the results were saved inside a sheet. Each line corresponds to one participant, the first cell is always the identifier of the participant, and following that is their response to each question.

The table of the results from the SUS questionnaire: please find at the end of the article the link to the google sheet to explore the same data. They are described in page 1 “Results”.

Calculating Individual responses of the SUS

Having all the data we can generate the first individual’s score using the sheets calculation:

((Responses!B2 + Responses!D2 + Responses!F2 + Responses!H2 + Responses!J2–5) + 25 — (Responses!C2 + Responses!E2 + Responses!G2 + Responses!I2 + Responses!K2)) * 2.5

To keep things short, here’s an explanation regarding how this calculation came about. We prefer the Overall SUS instead of just the “usable” dimension, since this is the standardised process for calculating the score.

Calculating the learnability is a piece of cake, all we need to do is take the score of the questions 4 and 10; as they are negatively-worded questions we need to subtract each response from 5. Then we can add the scores together and multiply by 12.5. Or in sheets language:

12.5 * ((5 — Responses!E2) + (5 — Responses!K2))

Note that the Learnability values in this case are either 87.5 or 100. Although the learnability can be lower than 87.5 there is no in between value from 87.5 and 100. Learnability is measured in intervals of 12.5, meaning that it has 9 possible values — 0, 12.5, 25, 37.5, 50, 62.5, 75, 87.5, 100. One can argue that learnability measured that way, while reliable, lacks a bit in accuracy.

Combining the results

Having the individuals’ perceived usability scores and learnability scores of the system, we can calculate the average scores of the system. That is in sheets:

For usability:

SUM(B2:B13) / 12

For learnability:

SUM(C2:C13) / 12

And visualise them in a simple line chart:

Bar graph with Overall SUS bar height of 78 and overall learnability with bar height of 96 out of 100.

Conclusion

The SUS, while being a powerful tool for measuring perceived usability quantitatively, can give reliable insights of the learnability of a system without any modification. UX Researchers may benefit from putting a number on the overall learnability of a system, in order to better benchmark future designs of a system and understand the users’ struggles.

References

[1] Tullis, Thomas S., and Jacqueline N. Stetson. “A comparison of questionnaires for assessing website usability.” Usability professional association conference. Vol. 1. 2004.

[2] Kortum, Philip, and Claudia Ziegler Acemyan. “How low can you go? Is the system usability scale range restricted?.” Journal of Usability Studies 9.1 (2013).

[3] Sauro, Jeff, and James R. Lewis. “Correlations among prototypical usability metrics: evidence for the construct of usability.” Proceedings of the SIGCHI conference on human factors in computing systems. 2009.