Robust Distortion-free Watermarks for Language Models

Talk Title: Robust Distortion-free Watermarks for Language Models

Abstract: We describe a protocol for planting watermarks into text generated by an autoregressive language model (LM) that is robust to edits and does not change the distribution of generated text. We generate watermarked text by controlling the source of randomness--using a secret watermark key--that the LM decoder uses to convert probabilities of text into samples. To detect the watermark, any party who knows the key can test for statistical correlations between a snippet of text and the watermark key; meanwhile, our watermark is provably undetectable by anyone who does not know the watermark key. We apply these watermarks to the OPT-1.3B, LLaMA 7B, and instruction-tuned Alpaca 7B LMs to experimentally validate their statistical power and robustness to various paraphrasing attacks. ArXiv: https://arxiv.org/abs/2307.15593

Bio: John Thickstun is an assistant professor of Computer Science at Cornell University. His current research interests include improving the capabilities and controllability of generative models, as well as applications of these methods to music technologies. Previously, he worked with Percy Liang as a postdoctoral researcher at Stanford University. John completed his PhD at the University of Washington, advised by Sham M. Kakade and Zaid Harchaoui. His work has been featured in media outlets including TechCrunch and the Times of London, recognized by outstanding-paper awards at NeurIPS and ACL, and supported by an NSF Graduate Fellowship and a Qualcomm Innovation Fellowship.