Patterns purposely buried in AI-generated texts could help identify them as such, allowing us to tell whether the words are written by an AI or not. These “watermarks” are invisible to the human eye but let computers detect that the text probably comes from an AI system. If embedded in large language models, they could help prevent some of the problems that these models have already caused. For example, since OpenAI’s chatbot ChatGPT was launched in November, students have already started cheating by using it to write essays for them.
Building the watermarking approach into systems before they are released could help address such problems. AI language models work by predicting and generating one word at a time. After each word, the watermarking algorithm randomly divides the language model’s vocabulary into words on a “greenlist” and a “redlist” and then prompts the model to choose words on the greenlist.
The more greenlisted words in a passage, the more likely it is that the text was generated by a machine. Text written by a person tends to contain a more random mix of words. For example, for the word “beautiful,” the watermarking algorithm could classify the word “flower” as green and “orchid” as red. The AI model with the watermarking algorithm would be more likely to use the word “flower” than “orchid”.
The breathtaking speed of AI development means that new, more powerful models quickly make our existing tool kit for detecting synthetic text less effective. It is a constant race between AI developers to build new safety tools that can match the latest generation of AI models.