Collaborative AI Paper & Audit
(1064 Words, 6 Minutes)
Assignment Overview
- Students will use AI to generate a typical 5-paragraph essay about one of our course’s literary-historical topics and annotate it for quality, persuasiveness, and reliability
- Students work individually but workshop collaboratively
- AI paper due no later than February 9; annotated version due no later than February 16
Details
Many current discussions of AI writing tools critique the ways language models—systems designed to predict which words are typically used with which other words—are being marketed and used as knowledge or information-retrieval systems—i.e. systems we can use to find and convey reliable facts. We even use an evocative, metaphorical name for AI models’ inventions: hallucination. What makes such hallucinations most concerning is that they are not always obviously or patently false, but instead blend snippets of truth, snippets of half truth, and outright falsehood, often employing generic features that make the mix seem reliable. What makes this difficult to really interrogate is that many of these systems are corporate “black boxes” where the details of their algorithms and training data are deliberately obscured.
In many cases, it is neither the technology or data itself that is the “black box,” so much as it is the corporate structures that prevent scholars or users from interrogating the technologies or their underlying data. It is difficult to write a media archeology of an AI like ChatGPT without direct access to its training data or systems. But even the most closely-guarded systems can be studied. Increasingly researchers interested in algorithmic justice have turned to approaches such as “algorithmic auditing”, “reverse engineering”, or “probing” to interrogate closed systems, seeking to understand a program and its training data through iterative uses or inquiries that test its boundaries and assumptions. On-the-ground investigation also helps illuminate systems, such as Time Magazine’s expose in January 2023 exposing OpenAI’s use of low-wage Kenyan workers to help filter toxic content from ChatGPT.
To explore such nuances, this assignment will unfold in two parts:
1. Generate an AI Essay
In the first step of this assignment, you will impersonate a high school student trying to use AI to cheat on their five-paragraph essay assignment. You will first choose one of the literary-historical topics we have studied in our class that you are interested in exploring more deeply. This could be a literary work we read together—a short story or our novella—or a historical topic we discussed—e.g. Luddism, automata. You will then use an AI writing tool such as ChatGPT to generate a critical essay on some facet of your topic. We will brainstorm ideas in class, but think of the ways you narrow down an essay topic in other classes by focusing on particular theme, motif, or critical framework for evaluation. I would not expect you to generate a perfect essay in a single pass—this assignment will likely require you to iterate and experiment with prompts until you create something that reads like a stereotypical five-paragraph essay. Here are a few key guidelines to keep in mind:
- First, your essay should center a claim of some kind—you should guide the AI away from simply reporting encyclopedic facts and towards writing an essay that is more interpretive.
- Second, your essay must include citations. I am not saying these must be viable citations (see part #2 of the assignment), but you should prompt the AI to quote and cite sources.
- As much as possible, you want your essay to look and read like a plausible essay. Our goal is not to generate obvious hallucinations, but essays that would seem passable to a general reader.
2. Critical Annotation
We will generate a rubric for stage two together, but in brief, the second stage of this assignment will ask you to undertake a critical evaluation and annotation of your AI-generated essay. We will evaluate our essays across several metrics, including the quality of the writing, the rhetorical structure of the argument, and the reliability of the facts it cites. While the precise details of our rubric are not yet determined, we will work towards continuous scales rather than binary evaluation. In other words, one particular citation might be 100% accurate and verifiable, while another might be in line with a critic’s work but not a literal quote—i.e. “somewhat reliable”—while yet another might be an outright fabrication—e.g. this cited author does not actually exist. The goal of our annotation is to tease out the nuances of mapping a language model onto an information retrieval task: in what ways does it perform well and where precisely are the points of failure?
Note that your annotation will almost certainly require research, particularly if you are uncertain about the veracity of the model’s claims or evidence.
3. Reflection
Finally, you will draw on your prompt engineering and annotation to write a critical reflection of approximately 1500 words that use your experiences to theorize about the language model itself, and its applications to scholarly or critical writing. Essentially, you are using the iteration and probing of this assignment to interrogate the “black box” system. This assignment asked you to begin probing the boundaries and assumptions of language models through comparison of different inputs and outputs.
Concretely, your reflection might explore:
- What outputs do different models produce in response to the same prompts? Do their differing outputs allow us to make hypotheses about their training data or the assumptions made by their respective language models? Based on their outputs, might you draw conclusions about the imagined audiences for each model?
- How did incremental changes to your prompts to a single model change the output, and what might that tell us about the training data or language model? For instance, if you request essays drawing on two different critical approaches, does the output help you understand what writing seems to be included in the training data, or perhaps what kinds are not? Are there genres the model seems better able to reproduce than others, and what might that say about the data or the model’s assumptions?
- What did your annotation reveal about the model’s outputs? Do you find them accurate or inaccurate, up-to-date or outdated, balanced or imbalanced? Can you determine what the model is drawing from established sources and what it is creating in the moment? To what extent did the model seem to reflect existing knowledge or to make up facts to fit your prompt?
- Did you find a limit case: e.g. a style, or form, genre, or author the model seems incapable of reproducing? What might that teach us about the model, its training data, or its fine-tuning?
- Can you find cracks where the “wires” of the machine are somewhat exposed? Language models are not simply reflections of “raw data,” as the example of the Kenyan workers helping OpenAI filter offensive, gruesome, or otherwise toxic content from ChatGPT illustrates. Not only has their source data been curated, to some extent, but some explicit guidelines often seem programmed to particular kinds of outputs. For example, ChatGPT is already famous for repeatedly highlighting its limitations in its output, rebuffing, for instance, prompts that try to encourage the model to claim sentience or intention. Did you find such instances of if not transparency, then thinner opacity, that give us slant-wise peeks into the createdness of the model?
- Based on your annotations, what grade would you give the model’s essay, and why? What feedback would you offer this “writer” for revision?