The Unsolved Challenges of LLMs in Open-Ended Web Tasks: A Case Study on the Future of Work

The Unsolved Challenges of LLMs in Open-Ended Web Tasks: A Case Study on the Future of Work

Welcome to the AI research bites. This series of short and informative talks showcases cutting-edge research work from ServiceNow AI Research team. The AI Research Bites are open to all, especially those interested in keeping up with the fast-paced AI research community.

This session features the collaborative efforts of Alexandre Lacoste and our team exploring how capable web agents are at solving common knowledge work tasks. These large language model-based agents must perform the daily work of knowledge workers utilizing enterprise browser-based software systems. As part of our collective endeavor, we introduce WorkArena, a benchmark hosted remotely, centred around the widely utilized ServiceNow platform. Additionally, we present BrowserGym, an environment crafted by our team to design and evaluate these agents.

You won’t believe what the agent clicked on!

WorkArena site:
https://servicenow.github.io/WorkArena/

Paper:
https://arxiv.org/abs/2403.07718

Repo WorkArena:
https://github.com/ServiceNow/WorkArena

Repo BrowserGym:
https://github.com/ServiceNow/BrowserGym

ServiceNow AI Research team:
https://www.servicenow.com/research/