9 *Human* Challenges with Using AI Co-Pilots

Stephen P. Anderson
11 min readMay 31, 2023

I recently gave a short talk on the dangers of LLMs (large language models). Most of what I shared echoed things I’ve read from Emily M. Bender and Timnit Gebru. I introduced folks to the “stochastic parrot” framing, and emphasized that we tend to attribute intelligence to something that has none.

Thank you, Jens Ohlig!

In the course of organizing my thoughts on all things LLM, I realized that most of my concerns are with the generative use cases; my central point being that many of the issues lie with humans, not the technology.

The problem is with humans, who to to recognize flawed, generalized or incoherent information and are fooled by false confidence coupled with no attribution, mis-attribution, or fictional attribution.

Since then, I’ve seen—and used—many examples of AI (specifically ChatGPT) as a ‘co-pilot’ for various text-oriented tasks. Writing a PR from the future. Generating potential ‘answers’ to problem statement. Crafting a Product Requirements Document. Generating job steps in a customer journey. Identifying why we should or shouldn’t replace product managers with an LLM! 😂

While there is promise, I’m concerned—like really, really concerned—about what this does and will do to most people. What does this do to our ability to think critically? Critical thinking skills are already in the toilet; this new tool (and it is a tool), it exacerbates the problem. Unless we’re very cautious, AI co-pilots will flush away what is left of understanding.

🤨

To be clear about what I’m rejecting: It’s not automation rendering skilled labor obsolete. It’s believing automation can render skilled labor obsolete — when in fact it cannot, not yet. It’s believing that the technology can accelerate common tasks — and failing to see what is lost.

And… to be clear, I’m not talking about the kind of skilled labor that is little more than rote, repetitive tasks. I’m talking about the domain of knowledge work, where there is (or should be) judgement, expertise, ongoing synthesis, reflection, empathy, adaptation, and on and on. The work of product teams, researchers, consultants, and similar positions. Using an AI co-pilot in these knowledge worker use cases should come with a warning label.

Here, I want to identify nine human problems you should be aware of if you’re using LLMs as an assistive tool for knowledge work:

The first three of these challenges relate to ignorance and a lack of domain expertise, so they lead with the “non-experts (and lazy thinkers)” phrasing.

Challenge 1: Non-experts (and lazy thinkers) confuse the 80% with 100%.

Here, I’m referring to secondary research—the kind of customer or domain research that gets you to a basic understanding of some new space.

The concept of “getting to 80%” comes from this webinar where Evan Shore shared how he’s using ChatGPT to accelerate his JTBD work:

I found ChatGPT to be particularly useful when I have the JTBD of: “When I am starting to work on a new product, help me go from zero to rough draft, so I can shortcut the learning curve and start to test and learn as soon as possible.” It dramatically accelerates onboarding to a new area and provides something to validate and refine instead of starting with a blank page.

This approach seems like a viable way to speed up the secondary work done at the start of any new project. According to Evan:

Feedback from the merchants [confirmed] that the output of those queries accurately covered 80% of their work. The remaining 20% is unique to Walmart. The result is a much faster discovery process, as the team could focus on filling in the 20% and prioritizing the opportunities versus starting from a blank page.

So far, all is good, with this caveat:

ChatGPT provides ~ 80% of the common/general insights and ideas I might have otherwise had, but it misses the 20% of nuanced insights and more creative ideas.

The challenge this is this: Confusing the 80% with 100%. It’s one thing to use this as a tool to accelerate the starting line for learning. It’s another to confuse the starting line with the finish line. While this tool may work wonders for the person who knows what ‘good’ is, I fear what this tool will do in the hands of those—the non-experts—who fail to see that the LLM generated text is a draft, at best.

Challenge 2: Non-experts (and lazy thinkers) fail to recognize gross errors and oversights.

We’ve already seen a number of headline examples of LLMs citing non-existent sources, ‘making up’ facts, and making ‘incoherent‘ statements (again, this isn’t a thinking tool!). As I write this, the most recent example is this “bonkers story about the lawyer using ChatGPT for federal court filings.” Punchline: It’s not going well.

If you’re a domain expert, you’ll laugh at many of the results. If you’re not, you’ll assume everything correct. Gullibility is dangerous. I’ve done enough queries related to things I know about—design, product management, board games, etc—to be very suspicious of responses to questions in which I am not an expert or lack training. If you’re thinking this tool can replace human expertise in niche domains, think again.

Challenge 3: Non-experts (and lazy thinkers) confuse general information with specific details

This one is a bit harder to spot, as there’s often no clear line between general and specific information. But, if you remind yourself that this is a probability tool, just looking for the next most likely autocomplete word or phrasing, then you should know this going to be generic information, lacking in nuanced details.

I guess I started thinking more about this when I attempted to use ChatGPT to create several artifacts commonly associated with product management activities. I had this moment where, on one hand, I was impressed at the volume of seemingly good information filling out these work artifacts; while at the same time, I felt it was all… vapid? A waste of time? I couldn’t tell if this was a comment on product management theater, or if the information was just… bland. But, in this moment, as an expert, I was able to recognize that there was very little of actual value in all of the text that had been generated. While there was plenty of problem framing content (of the 80% or draft variety), there was nothing that hinted at a solution or actionable next steps. It was… kinda pointless. This challenge is about being able to discern between general and specific information.

While the first three might be easy to point out, especially in hindsight, the challenges that follow are far more insidious and more likely to go unnoticed. These challenges apply to everyone.

Challenge 4: Anchoring bias makes it hard to explore other ideas

As humans, once information is handed to us, we tend to think in the same conceptual direction. Whether that’s building upon, refining, rejecting, editing, whatever—we are reacting to that initial information. It’s a feature (or bug?) of our silly little brains. This shows up in a number of documented ways: Anchoring bias, primacy effect, conformity bias, confirmation bias, functional fixidness, availability heuristic… In the case of using an AI co-pilot, it’s the (very confident) generated text that shapes and steers our thinking. Unless we’re very, very careful with how and when we use this tool, it will handicap our thinking.

Most recently, I was doing some problem definition work related to [A WORK TOPIC]. Fortunately, my initial braindump began with some random searches, and pulling a variety of relevant books from my personal library. When I finally did turn to ChatGPT to see what ‘it’ might generate on this same topic, I got a very different, complementary—and convincing—set of information. In this case, I ended up with a range of varied responses to consider. BUT, in this exercise, I saw the GIANT RED FLAG. Had I—in an effort to accelerate this work—began with ChatGPT, I’m almost certain the nature of the responses would have prevented me from even considering some of the other sources I turned to; this approach would have shaped and steered my thinking in a harmful way—effectively putting mental blinders on me.

If we’re going to turn to AI co-pilots for these kind of generative activities, then we need to also develop new habits to counter the negative side effects: Brainstorm with other people, or yourself, first. Do a mind-map on the topic, first. Build in ‘delay’ mechanisms that prompt users for their own ideas, first, before revealing the co-pilot’s string of text.

This challenge is harder to spot — as it’s all about paths not taken, and never even seen! This is about concepts we might have come across, had we had held space and time for more serendipitous—and human—connections

[Of all the challenges list here, this one concerns me the most; it’s difficult for even the most critical of thinkers—and experts—to counter.]

Challenge 5: Skipping the writing process short-circuits reflection (‘Writing as Thinking’)

I love writing. And re-writing. And making my thoughts visual. It’s all part of a thinking process. The books and articles I’ve written? They’re all me working out answers to the things I’m wrestling with. I fear an AI co-pilot kind of short circuits this process. Sure, we can crank out content faster, but at what cost? Part of composing an email or that written response is the personal reflection and thinking that this requires of us. It’s not just about communicating outward, it’s about reflecting inward—through writing.

“How do I know what I think until I see what I say?” ― E.M. Forster

Heavy use of an AI co-pilot discourages this thinking-reflection loop that is a critical part of putting thoughts into words. Writing is a way of thinking—and we just outsourced that.

Challenge 6: It’s difficult to have empathy from just second-hand, text-only, information

If you’re going to use an AI co-pilot for secondary research, go ahead. It can (as I mentioned above) get you to better questions, more quickly. But, and this is a big BUT, do not think for a moment that chatting with an AI co-pilot is a proxy for speaking with real humans. Seriously–go out and talk to people. It’s not just about the text you put down — it’s about the very act of getting out from behind a screen and having a real conversation with a real human being. And hopefully, developing some real, honest-to-goodness empathy for that person. You know what’s far more powerful than fancy personas? That memory of that conversation you had with that person, and how it shifted your understanding.

If we’re so focused on the artifacts — personas, target markets, customer journeys—that we neglect the messy, human process of creating these artifacts, then we’ve missed the point. To understand—and empathize—first-hand with real people. That’s it. Any tool that distances us from speaking with real people, we should handle with caution.

Challenge 7: Confident language inspires confidence in responses

Here’s a fun, obvious-when-you-think-about-it, bit of trivia: “Con man” is short for confidence man. A con is “an attempt to defraud a person or group after first gaining their trust.” Cons work because we—the human creatures that we are—are drawn to authority figures and people who can speak confidently. People who are probably right, but speak with reservation, are far less likely to be listened to over those who are confidently wrong.

While there is no intentional con going on here, the danger of AI co-pilots—as they’ve been programmed today—is the confidence with which responses are delivered; this format takes advantage of a human vulnerability.

Challenging 8: Most uses of an AI co-pilot isolate individuals and hinder team collaboration

So let’s imagine (but we don’t have to), that many of the artifacts that might take a team months to generate can be created in hours. What’s lost?

Whenever Chris Risdon talks about mapping the customer journey, he’s quick to emphasize it’s about the mapping, not the map. It’s the team process—the dialogue and debates and discussions and discoveries along the way—that are far more valuable than the thing on the wall.

“The process of mapping the journey is ultimately more important than the artifact it creates” —Chris Risdon

Here’s the same sentiment, from a product management POV, expressed in a different way:

“When teams interview together and visually express their thinking — through both experience maps and opportunity solution trees — they develop their knowledge and expertise together.” [Source]

“Develop their knowledge and expertise together.”

Sit with that for a moment.

If we’re focused on the documentation that comes at the end of a process, I think we’ve missed the point: It’s about the human process of arriving at these things; the artifacts are more of a confirmation of the alignment and and agreements we’ve reached as a group.

And yet…

All the uses of an AI co-pilot that I’ve observed have been solitary ones. A single individual, chatting with a probabilistic autocomplete tool, cutting and pasting the resulting string of words. A solitary activity.

“But wait,” you might say, “we’re just jumpstarting the team conversation. This is a way to start with something, rather than a blank page.” To which I’d ask: Are you, really? I know from decades of experience that it takes a lot of mental energy to focus on someone else’s work product. And if it’s something really complex, like much of knowledge work is, we tend to opt for the TL;DR version. And when we do spend time with these things, we tend react rather than co-author. Your use of a co-pilot just alienated the rest of the team.

We’re naturally distanced from anything that doesn’t originate with us—our creation (or co-creation). This is partly why, early in my career, I started working on walls—I wanted to include people in the thinking along the way, and avoid the whole ‘big reveal’ at the end of a process. This is why if I think I already know the answer to something, I ask questions: If I’m right, others will reach the same conclusion on their own; if I’m wrong, I’ll learn something I didn’t know. It’s the group collaboration process, not the resulting documentation.

To put a finer point on all this: Consider how the biggest shift in software, over the last decade, has been from tools built for personal computing to tools built for collaboration. Google Docs. Figma. Mural. These tools were built from the ground up as collaboration software.

Challenging 9: Forgetting the source material for these responses.

This last challenge is one I think about a lot, when I think about using an AI co-pilot to help with coding and software development. Setting aside the IP and copyright issues, what we’ve seen in software is an evolution over many years—building and improving upon things that have come before. And yet, I don’t know that a probabilistic text generator can separate the best or correct answer to a problem, only the most frequent one. Now think about knowledge work and… while there’s nothing we can debug, we should be critical of where this material is coming from (or based upon), and what the responses to that material were. Is this “answer” an outdated and rejected one? Does this string of words factor in recent changes? We should think a lot more about the source material and algorithms being used to generated these otherwise reasonable sounding phrases.

Closing Thoughts

And… that’s it. Nine top-of-mind challenges I’ve observed.

Go ahead, use the AI co-pilot, but just be thoughtful about it—everything I’ve listed here is a human challenge for us to be critically aware of. This is an amazing tool that strings together likely words — but there is no intelligent thought behind this. At all. This is a fine new tool, if you understand how to use it.

And like all tools, we should be aware of the personal dangers, as well as what using this tool might do to us…

We shape our tools, and then the tools shape us.

Or something like that.

--

--

Stephen P. Anderson

Speaker, educator, and design leader. On a mission to make learning the hard stuff fun, by creating ‘things to think with’ and ‘spaces’ for generative play.