WebMMU: Multimodal and Multilingual Evaluation of Agent Reasoning on Web
Welcome to the AI research bites. This series of short and informative talks showcases cutting-edge research work from ServiceNow AI Research team. The AI Research Bites are open to all, especially those interested in keeping up with the fast-paced AI research community. Modern web agents can read, but few can see holistically. Despite rapid progress in multimodal LLMs, today's models falter when asked to visually ground UI elements, reason over DOM structures, or edit complex layouts across diverse languages and domains. WebMMU is our attempt to course-correct.