BY DAVE KARPF
A few months ago, I was speaking with a reporter about the implications of generative AI tools like ChatGPT for newsrooms and educators. I suggested that it would take a while to work these things out, but what seemed most likely was that we would come to treat ChatGPT a bit like Wikipedia – “A good place to start/a bad place to finish.”
I have since become convinced that this was incorrect. Generative AI tools are neither a good place to start nor a good place to finish, at least for research-based tasks. The boundaries for their appropriate use are still murky and inexact.
I came to this revised conclusion as the result of a strange email exchange. A student at a European university reached out to me, politely asking for help obtaining a copy of an old article of mine. I normally handle messages like this by attaching a pre-publication version of the article (Anytime someone shows interest in reading some obscure piece I’ve written, I am happy to oblige.)
There was just one problem: I had no recollection of the article he mentioned. It was, he said, published in 2010, in the ANNALS of the American Academy of Political and Social Science. The title was “Tech lobbying in the United States: Exploring corporate political activity in the information age.” I don’t believe I’ve ever written an article with that title. I also don’t recall publishing a piece in that journal back in 2010. Maybe, I thought, he had the details a little wrong. It might be a book chapter for an edited volume, or perhaps a book review. (I’ve contributed a lot of book chapters to a lot of edited volumes over the years. Not all of them are particularly memorable.)
I checked my files and found nothing that fit the description. I checked my Google Scholar profile. Nothing there either. So I typed the following response:
“Thanks for your interest. Can you send me a link to the article you're referencing? I am not finding it anywhere in my records. It might have been published under an alternate name. I'd be happy to send it to you, just need a bit more help tracking it down.”
The student wrote back:
“Thank you for your response. The article should supposedly be here: https://journals.sagepub.com/doi/10.1177/0002716210373643”
That web link takes us to a file-not-found page. It’s a dead end. (Which, to be fair, is a perfectly good reason to email the author.) So I replied:
“That link leads me to an error message. What's the full article citation? Title, journal, date, etc?”
He responded with the following details: Karpf, D. (March, 2010). Tech lobbying in the United States: Exploring corporate political activity in the information age. The ANNALS of the American Academy of Political and Social Science, 628(1), 194-211.
Bewildered, I checked volume 628(1) of ANNALS. No such article was published in that journal. Pages 194-211 span three published articles, none of which have any topical overlap with the piece, and none of which reference me. So I replied, still trying my best to be helpful: “I'm looking at the table of contents of that issue of ANNALS. That article does not seem to exist. Where did you find this reference? Again, I'm happy to help, but the article title isn't ringing a bell...” And at that point the student stopped responding. I had wasted half an hour on a complete snipe hunt. What was going on? Where did this phantom citation come from?
Giving it some though, I developed a pretty strong hunch: I suspect the student used ChatGPT. I think he was trying to use it the right way. And that created trouble for both of us.
A historical analogue
Some reflection on recent history is in order: Back in the ‘00s, after a lengthy period of mistrust and confusion, Wikipedia came to be known as a great place to start your research, and a terrible place to finish. This was after the moral panic had subsided. The emergence of Wikipedia prompted an outpouring of frustration from professors and writers. Students were turning to this collaboratively-generated website instead of the old-fashioned encyclopedia for answers! Wikipedia was functioning as an untrustworthy Cliffs Notes. Professors tried banning it from the classroom, but fretted that it was too irresistible to students in a rush to write papers.
Over time, we collectively learned what tasks Wikipedia was particularly good for and what tasks it ought to be excluded from. It turned out that Wikipedia is an excellent resource when beginning to research a new topic. It provides (usually) competent summaries of complex topics, along with citations that encourage further research. But, as such, the research process ought not end with Wikipedia. It should extend to source materials and engage directly with the main underlying ideas.
Google now features Wikipedia entries as the top search result for many queries. Who would have thought 20 years ago that one of the main practical tools for combating misinformation would be the collaborative website that anyone can edit? Wikipedia is more trustworthy than the open web. It’s more trustworthy than Twitter or Facebook. And it includes citations. If you want to learn about, say, Moore’s Law the reasonable first step is to visit Wikipedia for a summary. From there, you can dig into some of the pages 181 citations, or pick up one of the books and articles suggested in the “further reading” section. If you are a student writing a research paper, you will cite the books and articles, not the Wikipedia entry. Wikipedia points to and summarizes the underlying research, but you should then take the time to review it yourself.
That is the appropriate, ethical way to make use of Wikipedia for research today.
Within the AI community, what happened to this unsuspecting student is referred to as “hallucination.” Generative AI is, as Arvind Narayann and Sayash Kapoor (2022) note, a “bullshit generator.” They mean bullshit in the technical sense established by philosopher Harry Frankfurt in his celebrated book On Bullshit (2005) – speech that is “intended to persuade without regard for the truth.” ChatGPT, for instance, is an extraordinarily advanced version of autocomplete. It has no underlying theory of truth or facts. It does not think, reason, or theorize in the manner that humans are accustomed to. Products like ChatGPT are exceptionally overpowered guess-the-next-word engines. So they are prone to hallucinating facts, claims, and fake citations that sound remarkably plausible, because they are the types of facts, claims, and citations one might expect to see. (If one were to brainstorm a list of academic authors who might publish an article on “Tech lobbying in the United States,” I would be on it. If you were to generate a list of journals that might publish such an article, ANNALS would be included. It is all remarkably plausible, without being in any sense true.
Here's another example of hallucination: I recently received access to Google’s Bard, the company’s competitor to ChatGPT. I decided to test it out by asking for a summary of my 2012 book, The MoveOn Effect. Bard instantly fabricated a book with the same title/different subtitle, attributing it to an author named David Callahan, and invented a bullet-point summary that has little to do with the actual book as-written. The citation was plausible enough that I took some time making sure David Callahan had not written such a book. Searches of Google and Amazon confirmed for me that no such book existed. But they also confirmed that Callahan is the sort of author who might have written such a book.
Some Generative AI proponents insist that this is merely an early-days problem. The simplest explanation is that the Large Language Model powering Google Bard still isn’t large enough to include my obscure book in its dataset. A colleague fed the same prompt into OpenAI’s GPT4, which with several billion parameters is far more powerful that Google Bard or GPT3.5, and it correctly identified the book and produced a reasonable bullet-point summary. With enough data, it is possible that the hallucination problem will be reduced.
But this still leaves us with a central conundrum. How does one ethically and responsibly make use of these new technologies?
Imagine you are a student (or a freelance journalist) conducting research today. You have heard constantly about these new Generative AI tools. You know that people are using Generative AI to write their papers for them. You know that’s plagiarism. You don’t want to do that. You aren’t trying to cheat. You’re trying to learn. So what you might reasonably do is treat Generative AI the same way we treat Wikipedia — a good place to start/terrible place to finish. So you ask the AI to create an essay summarizing the history of, for instance, tech lobbying in the United States. That’s an important and interesting topic. There is no Wikipedia entry on it. That machine-generated essay, just like Wikipedia, includes citations. Brilliant. You can now do the responsible thing, tracking down those citations, reading them, and producing a better essay yourself. This all seems reasonable, responsible, and ethical. But it is also a completely untrustworthy foundation for future research, because you have no way of judging which plausible claims are based and fact versus, well, bullshit.
When is hallucination an asset?
I suspect that the platonic ideal use-case for generative AI can be summarized as “Clippy, but awesome.” Clippy, as some readers might recall, was the Microsoft Office virtual assistant in the late 1990s and early 2000s. It was a talking paper clip that would appear next to your writing and offer formatting tips. Clippy was widely derided as being useless and annoying. The advice it could offer simply wasn’t worth the constant, automatic interruptions.
Microsoft has invested over $11 billion in OpenAI. It has incorporated GPT3.5 into its Bing search engine, and has plans to integrate generative AI into its wider suite of tools. What would a GPT-powered version of Clippy look like? It might offer helpful insights into Powerpoint formatting and Excel spreadsheet tips. It might pick out the narrative forms of various Word Documents and conjure time-saving structural suggestions. These would be uses for which hallucination is an asset rather than a burden. Leave the facts and analysis to human minds. Trust a neural network to generate presentation options.
At least in these early days, I suspect this rule of thumb to provide an ethical guidepost. Assume that ChatGPT and similar tools are untrustworthy narrators, prone to long bouts of hallucination that are impossible for the unsuspecting reader to identify without significant effort. For creative purposes where hallucination is a valuable tool, one can comfortably deploy such tools (setting aside copyright challenges and the ethical dilemma of exploitation of human labor – each substantive conundrums in their own right). But for research purposes where hallucination is a barrier to be overcome, the tools should be distrusted or outright rejected.
I suspect we have entered the early stages of a well-deserved moral panic surrounding the use and misuse of generative AI. It is important to recognize that this is not much like Wikipedia at all. This, in fact, wreaks havoc on many of the practices that normally ensure good/hygienic media literacy. Citations are not proof of anything unless you have the time available to read the citations. Images and videos are no longer proof of much – generative AI tools are already being applied to video and sound files, producing convincing deep fakes at minimal cost. Strategic actors, pursuing power and/or profit, are going to find creative ways to pollute the information ecosystem with utter garbage as soon as they see a clear advantage in doing so.
Businesses are also already looking to generative AI as an opportunity for substantial cost-savings. The tools do not need to be better than professionals, they just need to be cheaper and faster while also being good-enough. And, given the pace of investment and public interest in this sector, it is unlikely that we will see the type of mass public rejection that seemingly “inevitable” technologies like Google Glass encountered. The future will, most likely, integrate many forms of generative AI into our daily lives.
For researchers, educators, students, and writers, we are entering barely-mapped terrain, filled with warning signs. It is wise to proceed carefully and ask hard questions.
My own early mistake lay in presuming too quickly that the role of this technology would likely resemble the role of a previous technology. The resemblance is faint at best.
There will eventually be a plethora of ethical uses for generative AI. But they will be constrained to those uses to which the technology is actually well-suited. Evaluating the ethics of AI requires clear engagement with its limitations. The better we demystify and understand it, the better we can prepare.
Frankfurt, Harry G. 2005. On Bullshit. Princeton, NJ: Princeton University Press.
Karpf, David. 2012. The MoveOn Effect: The Unexpected Transformation of American Political Advocacy. New York: Oxford University Press.
Karpf, David. March 20, 2023. “On Generative AI, phantom citations, and social calluses.” Blog post, Substack.com. https://davekarpf.substack.com/p/on-generative-ai-phantom-citations
Narayanan, Arvind and Sayash Kapoor. December 6, 2022. “ChatGPT is a bullshit generator. But it can still be amazingly useful.” Blog post, Substack.com. https://aisnakeoil.substack.com/p/chatgpt-is-a-bullshit-generator-but
Note: This essay was adapted from a peice published to Substack on March 20, 2023, “On Generative AI, phantom citations, and social calluses.”
- Dave Karpf is an Associate Professor in the George Washington University School of Media & Public Affairs. He teaches and conducts research on strategic political communication in the digital age. His current research is focused on the history of the digital future. He writes regularly on Substack, where an earlier version of this essay appeared.
Image by Prostock-studio on Envato Elements.