A/B testing sounds really good in theory, but…

Nam
UX Collective
Published in
5 min readJun 15, 2021

--

You probably have seen this illustration: two boxes side by side, one denoted with the letter ‘A’ and the other, ‘B’. These boxes would contain some visual elements to indicate that A and B are different. In the example below, the difference is in the arrangement of the grey rectangles. There would be a “check” symbol on one of the boxes, signifying a validation, a win.

As you might already know, this common graphic visualizes the concept of A/B testing in product and user experience research — a technique to compare two variants of a design as determined by randomized user response.

An illustration of AB testing concept in which a box A is placed next to a box B, with a check-mark on box A denoting that A is the winning candidate.

You might also be very familiar with the following visual: a sequence of steps connected by arrows with the last one pointing back to the first step. The number of steps and the labels can vary slightly but often contain the following keywords: “goal,” “metric,” “hypothesis,” “run experiment,” “analyze results.” As you may already know, this linear and sometimes cyclic motion of steps is a common representation of the A/B testing process.

4 steps of an A/B testing process, including setting up a goal (and select a metric), developing a hypothesis, setting up and run the experiment, and lastly, analyze results. The results from the last step are connected back to the first step to inform the creation of the next test.

While these visuals are effective in democratizing knowledge, their reductive nature can obscure the nuances of the process and the influence of organizational politics on such seemingly uncomplicated procedures. On one hand, the promise of a snap and confident decision often shadows the intensity and the amount of development labor. A simple “let’s A/B test this” from a senior executive can turn into days and weeks of agony for the execution team. On the other hand, A/B testing elicits a deterministic binary of “winner” and “loser” — an attractive “science” trope that shifts the focus from learning the “why” to find the next big win.

The promise of a quick decision-making process obscures the messiness of development work.

The testing URLs have issues. The code cannot be deployed by the testing tool. From hypothesizing to developing and running experiments, technical issues materialize from time to time to haunt the efficiency image of A/B testing. The engineering effort can scale quickly as teams graduate from testing color usage and call-to-action of a button. For example, testing a front-end interface that integrates dynamically with a database might require the creation of new data fields and mechanisms to display such data — all of which require additional time and labor.

The desire for a recursive and constant testing motion can also be a challenge to an organization that has yet committed personnel- and process-wise to such endeavor. In a scrum environment, the pacing of experiments is highly dependent on prioritization and sprint schedules. Say, if the team needs to launch a test immediately but the deployment is not until two weeks after, should the team invest in a mini deployment or just pick a design and hope for the best? Managing an A/B testing pipeline, in this sense, is about acknowledging and navigating the contingencies around planning and development cycles—the ups and downs that breed unforeseeable work.

A deterministic binary of winner and loser distracts the organization from learning the “why”.

Like most methods in an applied research environment, A/B testing is more than just a learning tool: it is also a decision-making tool. An A/B test can minimize the collective doubt of team-based design choices and provide the necessary certainty to move the process forward.

Framing research findings as “winner” and “loser” generates excitement, a sense of achievement, and in some cases, an unfortunate triumph of individual ego and personal politics. You might have heard phrases such as “I predicted that” or “you win” in a readout of an A/B test. Such phrases distract teams from making sense of the results with honesty and sincerity, foreclosing early spaces for doubt that are important to productive learning.

In addition, despite employing statistical significance as a guarantor of confidence, the result of an A/B test cannot easily explain why users prefer one design from another, especially in complex tests. An ill-defined causal relationship should be accompanied by other research methods for teams to truly understand user preference. In fast-paced and resource-scarce product development cycles, such “backward” pursuit of “why” is in direct competition with the “forward” motion of the development cycles—the chase of the next improvement, the next win.

In these environments, A/B testing becomes the de-facto “research” method but has limited utility in accumulating a deep understanding of the user base. In 2019, Uber laid off half of its researchers globally, citing the shift towards rapid A/B testing to make product decisions rather than foundational and exploratory research.

There is no doubt that A/B testing is an efficient learning and decision-making tool in product development. Scaling A/B testing into practice, however, requires attention to the messiness of development work and how this method, as both a tool and a metaphor, is used and misused.

While visuals such as those presented above are effective in presenting a high-level understanding of this research method, it is important to delineate the technical complexity and level of effort necessary in developing and running a test, especially to stakeholders who are removed from the nuances of the process. No visuals are perfect, but perhaps more deliberation could help.

It is also helpful to situate (and re-situate) A/B tests in the larger learning agenda of the team by developing connections to past research through referencing and archiving work while incorporating long-term theory-building into short-term hunts for wins and loses. In fast-paced development cycles where teams are prone to short-term memory, metadata, tagging, and a studious attitude are critical to accumulate rich learnings of the user base and balance out the obsession for the next big win.

The UX Collective donates US$1 for each article we publish. This story contributed to World-Class Designer School: a college-level, tuition-free design school focused on preparing young and talented African designers for the local and international digital product market. Build the design community you believe in.

--

--

Writings on research, methods, and the queer hope of knowing others. p-nam.com