ChatGPT cannot do user research

Why ChatGPT may produce inaccurate information about people, places, or facts.

Jason Godesky
UX Collective

--

I ask ChatGPT: “Create a user persona for a hotel booking app.” ChatGPT responds: “User Persona 1: Business Traveler. Name: Sarah Thompson. Age: 35. Gender: Female. Occupation: Sales Manager. Location: Chicago, IL. Tech Savviness: High. Goals: Find hotels close to her clients’ offices or conference venues; manage and track her travel expenses; ensure a comfortable and quiet environment for work and rest.” The response continues past what the image shows.
This is not a persona. This is actually worse than nothing.

A controversy is brewing in the world of UX. On one side are those who see ChatGPT as an incredible tool that allows UX professionals to replace difficult, time-consuming user interviews with simple prompt engineering. The image above is the beginning of a result that GPT-4 gave me when I used a prompt copied verbatim from one such article. To some UX researchers, this might look like magic. It might not be the best persona you’ve ever seen, but it’s not bad, and we’ve all worked for days or weeks to produce personas this good. GPT-4 can do it in seconds. As one of these articles says:

“Personas are fictional characters that represent a group of users with similar characteristics. By training Chat GPT on a dataset of user demographics and behavior patterns, it can generate detailed user personas that can be used to inform the design of products and services.”

There’s just one small problem: that’s not what ChatGPT does.

GPT stands for Generative Pre-trained Transformer. Note that the “P” stands for “pre-trained.” You can’t take your dataset of user feedback and train GPT on it. As it will frequently tell you itself, it’s a language model. It doesn’t actually understand what user demographics or behavior patterns are. If you keep putting in your user feedback, you might get results that could trick you into thinking that it’s actually considering the data that you gave it — right up until the moment when it starts incorporating entirely different data sets that it read about once on UX Stack Exchange.

In other articles, this is heralded as a feature, not a bug. After all, ChatGPT has already been trained on a massive dataset, one that surely includes more information about user behavior than you could ever collect in a lifetime of interviews and research activities. It will produce personas based on far more data than you could ever sift through, meaning that the personas that it creates will be far better than anything a human UX researcher could ever hope to create.

I don’t want to oversimplify the process by which GPT-4 was trained — it’s actually rather fascinating and complex — but it is primarily going by things that it read on the internet.

An illustration showing a young woman about to vomit. The young woman is labeled “GPT-4.” Another label over her stomach reads: “Gross stuff it ate on the internet.”
How ChatGPT actually works. (Credit: Fireship)

Do you generally find a lot of user interviews uploaded to the internet for all to see, or is that more often classified as proprietary information and kept out of any public database on which GPT-4 could have been trained? When you ask ChatGPT to write a persona, its response is not based on careful analysis of a rich dataset that you’ll never have access to; it’s based on forum posts and blogs.

ChatGPT is a language model. It never hesitates to tell you that. It can’t analyze data, it can’t consider user behavior, and it can’t come to a conclusion. What it does — really well — is come up with something that a response would sound like:

“If you put in a scientific question, and it comes back with a response citing a non-existent paper with a plausible title, using a real journal name and an author name who’s written things related to your question, it’s not being tricky or telling lies or doing anything at all surprising! This is what a response would sound like! It did the thing!

But people keep wanting the “say something that sounds like an answer” machine to be doing something else, and believe it is doing something else.

It’s good at generating things that sound like responses to being told it was wrong, so people think that it’s engaging in introspection or looking up more information or something, but it’s not, it’s only, ever, saying something that sounds like the next bit of the conversation.”

Personas have been a tricky point in UX research from the beginning. Good UX researchers know that a persona is a fiction we write to tell the truth. It’s a way to communicate research findings, and so a persona is only ever as valuable as the research it’s communicating. But because the final deliverable takes the form of a piece of fiction, we’ve fought a constant battle against the misconception that a persona made up of nothing but our assumptions can be just as good. Junior UX researchers see these struggles and might even be tricked into thinking that personas without research must be good for something, even if it’s just to take a first step.

Such fake personas are dangerous because they look like they’re summarizing real research, as a real persona would, but there’s no research or reality behind it. They’re pure fiction. At what point in the process is it helpful to delude yourself into thinking you know more than you do? A fake persona is worse than nothing. While a blank space can tell you honestly how much you know, a fake persona tells you that you know something when you don’t.

ChatGPT will churn out a fake persona in seconds. If you’re willing to pay $20 per month, GPT-4 will fulfill your request for more fake personas 25 times every three hours, with a tiny disclaimer at the bottom all the while reminding you that

ChatGPT may produce inaccurate information about people, places, or facts.

Personas are a particularly good example, because it’s a situation where UX professionals sometimes get tripped up by the difference between the format and the value. That’s the core problem with ChatGPT in so many ways. It’s the same reason why posing your research questions directly to ChatGPT is just as useless. It can’t tell you what usability problems people will encounter with your site; it’s not even actually looking at your site. It’s just giving you an example of what a real UX research report might plausibly sound like.

The AGI UX Researcher of the future

Such might be the state of GPT-4, but what about the future? What about the next generation of AGI (Artificial General Intelligence)? Surely that will be able to automate UX research, right?

Setting aside the larger question of whether or not AGI is plausible (I think we’ll need to do a lot more work on the implications of the social brain hypothesis before we get anywhere near AGI, myself), such an AGI could remind you of some UX “best practices” in seconds, but what “best practices” do we have in our field that don’t come with the same footnote of “may not apply to all user groups”?

Ostensibly, you’re working on something that has some kind of unique value proposition. If it doesn’t, why don’t we just use the thing that already exists and provides everything you were planning to provide already? Since we’re talking about a unique thing, the set of people who use it are going to be different from every other set of people. It might be unlikely that they’ll deviate terribly far from the norm in most regards, but you won’t know until you start talking to them. But since your thing is unique in some regard, this set is likely to deviate from the norm in those regards that relate directly to that unique value proposition.

If you’re making the first social network for orchestra conductors, then you’re likely to have a lot more orchestra conductors among your users than other sites. Are orchestra conductors more bossy and demanding than average? Or are they actually more patient? Is it the case that it takes time to become a conductor, so they’ll skew older, so they might need larger type and more contrast? Or has a lifetime of reading sheet music trained their eyes such that larger type would strike them as unwieldy and ugly? You can make guesses, and some of your guesses might even be proven correct, but you won’t know until you talk to some of them (and most UX researchers will tell you that it’s a rare effort that doesn’t yield something surprising).

UX design is pretty explicit about where its value comes from. It’s right there in the first letter: “U” for “user.” To know how people will use the things we make, we have to talk to them. For an AGI to do what a UX researcher does, it would have to do what a UX researcher does — talk to people. Perhaps one day we’ll develop AGI’s that can analyze interviews, identify subtext from facial expressions and body language as reliably as a human can, isolate important points from what users say, and compile them automatically into all manner of deliverables, but it will still have to take the time to schedule and conduct interviews with human beings to accomplish anything worthwhile.

How ChatGPT hijacks human empathy

ChatGPT’s visually-oriented kin are well-known for the difficulty they have with hands. As human beings, we have an intuitive sense for the physicality of hands, and can easily notice small visual problems. We do the same with faces, but we take a lot of pictures of faces and describe them in some detail in the captions that these AI’s are trained on. We rarely take the time to write captions that are detailed enough to describe where each finger of a hand is placed.

ChatGPT is not fundamentally smarter than Midjourney or DALL-E, it’s just dealing with us through language. There is no objective reality like our visual perception of hands against which to check ChatGPT. In conversation, we look for the meaning behind the words that we hear or read for the intent of the mind that produced them. It’s our natural extension of empathy that allows any communication to take place. Without at least a minimal application of imagination, we wouldn’t be able to make sense of even the simplest, most straightforward declaration. So to convince us of its intelligence, ChatGPT doesn’t have to do everything itself — we’re more than willing to meet it halfway. Where generative AI’s that deal in images might produce incongruities that we can easily see (like non-Euclidean hand-like horrors), when ChatGPT makes such mistakes, we paper them over with our own empathetic enthusiasm. As it turns out, the Turing test was always slanted in the machines’ favor.

Tim Levine has championed the idea that “truth and honesty are the default modes of communication. People are typically honest unless they have a specific reason to communicate deceptively, and people tend to believe others unless suspicion, skepticism, or doubt is actively triggered.” It’s rare for us to meet human beings who will make something up on the spot and then pronounce it as boldly and confidently as ChatGPT does. Rare, but not unheard of, because pathological liars do exist. Like ChatGPT, we find it hard to deal with pathological liars simply because we find it hard to consider the possibility that someone could lie like that. We’re looking for the mind that’s forming the meaning behind the words. We can understand motivated deceptions — those triggers that Tim Levine refers to — where someone might lie to protect herself or someone else. Rather than breaking our ability to communicate, this type of lying only deepens the theory of mind that we develop in conversation with one another. But with a pathological liar, meaning dissolves into absurdity, as you come to realize that there may not be any meaning to apprehend whatsoever.

It’s anthropomorphizing to say that ChatGPT lies to us. As mentioned above, it’s always “saying something that sounds like the next bit of the conversation.” But because language is its interface, there’s always a part of our human brains looking for the mind behind those words. The mind that it finds is a projection. It acts like a pathological liar, but that’s just a projection, too.

So what can ChatGPT do for UX Researchers?

ChatGPT can’t do UX research for you. That’s fundamentally outside of what it can do. It’s beyond what any AI can do. If you take the users out of it, what you’re doing isn’t user experience design.

But I don’t want to just rain on everyone’s parade. ChatGPT can’t do user research, but it can be a great aid to human creativity. Creativity is fundamentally about finding new connections between things, and while you can’t really trust anything that ChatGPT says, it can be a great engine for giving you new things to connect. It can also be pretty great for writing a first draft for a human editor to go over.

So, to end this on a more hopeful note, here are 5 prompts that could be genuinely useful for UX professionals.

#1. Begin Competitor Analysis

If ChatGPT is the beginning and end of your competitor analysis, you’re going to run into some of the same problems you would if you tried to use it for user research. That said, the existence of your competitors is probably something better documented on the internet than user interviews, so asking ChatGPT to come up with a list of your competitors might fill in a few that you missed. Just remember that ChatGPT may very well have missed some important ones, too.

Prompt: Provide a competitor analysis for hotel booking websites. Include the most popular hotel booking websites in the analysis. Compare the websites based on factors like user interface, pricing, loyalty programs, and customer support. Discuss the unique selling points of each website. Analyze their target markets and demographics. Assess the market share of each competitor.

#2. Write Notes

If you can get a reliable voice-to-text algorithm to turn your interview recording into text — or just take the time to transcribe it by hand the old-fashioned way — you can pass it to ChatGPT to write up some notes for you. As always, look over GPT’s shoulder. You’ll still need to read over the notes that it produces and make edits when it gets things wrong.

Prompt: Below is a transcript of my interview with John Doe, conducted on Monday, April 10, 2023 at 3:00 PM EST over Zoom. Provide me with a summary that includes the main topics covered in the interview, key points made by the interviewee, and any notable insights or takeaways. The summary should be concise and well-organized.

#3. Find Inspiration

Sometimes it helps to look at how others have tackled similar problems. Finding inspiration by searching online can be great, but it’s also something that ChatGPT might be able to help with.

Prompt: Provide a list of at least ten global websites for booking luxury hotels, along with a brief summary of their key features and notable aspects of their user experience. The list should highlight what makes each website exceptional, and showcase best practices for designing a great hotel booking website. When evaluating each website, consider aspects such as ease of use, visual design, filtering options, payment and cancellation policies, customer support, and overall user experience. Highlight any unique or standout features of each website, such as loyalty programs, personalized recommendations, or exclusive perks for members.

#4. Improve Microcopy

Microcopy is essential to a great user experience, but frequently overlooked. If you’re not sure how many different ways you can phrase it, you can try asking ChatGPT for some ideas. It might not come up with anything that you can use directly, but at the very least it could give you a new way of thinking about it. With a few good candidates, you can set up multivariate tests to see which ones lead to the best outcomes. If you have a voice and tone guide, add a few choice passages from that to the prompt to produce far better results.

Prompt: Generate ten different phrasings for the microcopy on a call-to-action button for booking a hotel room on a website.

#5. Engineer a Prompt

This one comes from Bret Littlefield, who says that he got it from a Discord. I’ve been using this one for all kinds of different purposes (including each of the prompts above). Just remember what ChatGPT is, and don’t let it fool you into thinking that it can do more than it really can just because it can talk to you.

Prompt: I want you to become my Prompt Creator. Your goal is to help me craft the best possible prompt for my needs. The prompt will be used by you, ChatGPT. You will follow the following process:

1. Your first response will be to ask me what the prompt should be about. I will provide my answer, but we will need to improve it through continual iterations by going through the next steps.

2. Based on my input, you will generate 3 sections. a) Revised prompt (provide your rewritten prompt. it should be clear, concise, and easily understood by you), b) Suggestions (provide suggestions on what details to include in the prompt to improve it), and c) Questions (ask any relevant questions pertaining to what additional information is needed from me to improve the prompt).

3. We will continue this iterative process with me providing additional information to you and you updating the prompt in the Revised prompt section until it’s complete.

--

--

I’m a product designer with full-stack development experience from Pittsburgh, Pennsylvania.