Unifying Autoregressive and Diffusion-Based Language Models
Welcome to the AI research bites. This series of short and informative talks showcases cutting-edge research work from ServiceNow AI Research team. The AI Research Bites are open to all, especially those interested in keeping up with the fast-paced AI research community.
In this presentation, Pierre-André Noël will show that, if you squint hard enough, autoregressive language models are a special case of diffusion models. We make this idea more concrete by introducing hyperschedules, allowing different token positions to get different noise levels. Other developments include new hybrid processes (combining the strengths of two major discrete diffusion approaches) and adaptive correction sampling (helping the model fix its own mistakes). All our innovations support efficient training and fast inference with KV-caching.
Paper: https://arxiv.org/abs/2504.06416
ServiceNow AI Research team: https://www.servicenow.com/research/