2023 wasn’t a great year for AI detectors. Leaders like GPTZero surged in popularity but faced a backlash as false positives led to incorrect accusations. Then OpenAI quietly tossed ice-cold water on the idea with an FAQ to answer whether AI detectors work. The verdict? “No, not in our experience.”
OpenAI’s conclusion was correct—at the time. Yet the demise of AI detectors is greatly exaggerated. Researchers are inventing new detectors that perform better than their predecessors and can operate at scale. And these come alongside “data poisoning” attacks that individuals can use to safeguard their work from being scooped up against their wishes to train AI models.
“Language model detection can be done with a high enough level of accuracy to be useful, and it can also be done in the ‘zero shot’ sense, meaning you can detect all sorts of different language models at the same time,” says Tom Goldstein, a professor of computer science at the University of Maryland. “It’s a real counterpoint to the narrative that language model detection is basically impossible.”
Early AI detectors played at detective by asking a simple question: How surprising is this text? The assumption was that statistically less surprising text is more likely to be AI-generated. It’s an LLM’s mission to predict the “correct” word at each point in a string of text, which should lead to patterns a detector can pick up. Most detectors answered by giving users a numerical probability that the text submitted to it was AI-generated.
But that approach is flawed. AI-generated text can still be surprising if it’s generated in response to a surprising prompt, which the detector has no way to deduce. And the opposite is true, as well. Humans may write unsurprising text if covering a well-worn topic.