Monitoring LLM Agents’ Tool Use
Supervisors
Suitable for
Abstract
Monitoring LLM Agents’ Tool Use
*Modern LLM agents invoke tools and act on external information (search, code, files), creating new safety risks: malicious instructions may be hidden in tool outputs, and retrieved files may contain harmful instructions or hijacks (Aichberger et al, 2025). Given LLM agents’ ability to act autonomously in the real world, it is crucial that we develop more sophisticated techniques to address the unique challenges and emerging risks. This project aims to design lightweight monitors of LLM agents’ tool usage that can be run externally; catching and preventing injection of harmful instructions and hijacks. This project will likely be joint with collaborators from Microsoft as an industry partner.