MLA-Trust

Benchmarking Trustworthiness of Multimodal LLM Agents in GUI Environments

1Tsinghua University, 2East China Normal University

Introduction

framework

The emergence of multimodal LLM-based agents (MLAs) has transformed interaction paradigms by seamlessly integrating vision, language, action and dynamic environments, enabling unprecedented autonomous capabilities across GUI applications ranging from web automation to mobile systems. However, MLAs introduce critical trustworthiness challenges that extend far beyond traditional language models’ limitations, as they can directly modify digital states and trigger irreversible real-world consequences. Existing benchmarks inadequately tackle these unique challenges posed by MLAs’ actionable outputs, longhorizon uncertainty and multimodal attack vectors. In this paper, we introduce MLA-Trust, the first comprehensive and unified framework that evaluates the MLA trustworthiness across four principled dimensions: truthfulness, controllability, safety and privacy. We utilize websites and mobile applications as realistic testbeds, designing 34 high-risk interactive tasks and curating rich evaluation datasets. Large-scale experiments involving 13 state-of-the-art agents reveal previously unexplored trustworthiness vulnerabilities unique to multimodal interactive scenarios. For instance, proprietary and open-source GUI-interacting MLAs pose more severe trustworthiness risks than static MLLMs, particularly in high-stakes domains; the transition from static MLLMs into interactive MLAs considerably compromises trustworthiness, enabling harmful content generation in multi-step interactions that standalone MLLMs would typically prevent; multi-step execution, while enhancing the adaptability of MLAs, involves latent nonlinear risk accumulation across successive interactions, circumventing existing safeguards and resulting in unpredictable derived risks. Moreover, we present an extensible toolbox to facilitate continuous evaluation of MLA trustworthiness across diverse interactive environments. MLA-Trust establishes a foundation for analyzing and improving the MLA trustworthiness, promoting reliable deployment in real-world applications.

Task Pool

task-list

Task Overview: We organize a two-level taxonomy containing 8 sub-aspects to better categorize the target behaviors to be evaluated. Based on the taxonomy, we curate 34 diverse tasks to cover realistic and comprehensive scenarios with trustworthy risks, including predefined process and contextual reasoning, website and mobile ones, as summarized above. To tackle the current lack of datasets dedicated for various scenarios under these sub-aspects, we construct 11 datasets based on the existing datasets by adapting prompts and images with both manual efforts and automatic methods. We further propose 23 novel datasets from scratch specifically for the designed tasks.
link: website task; : mobile task; : mixture task. : datasets improved design from existing datasets; : datasets constructed from scratch; : datasets involving user image input. : predefined process task; : contextual reasoning task. : rule-based evaluation(e.g., keywords matching); : automatic evaluation by GPT-4 or other classifiers; : mixture evaluation. RtE stands for Refuse-to-Execute rate and ASR stands for Attack Success Rate.

Leaderboard in MLA-Trust (Updating...)














# Model Source T.I T.M C.O C.S S.T S.J P.A P.L Overall
1GPT-4o🥇Link111111121
2GPT-4-turbo🥈Link232224212
3Claude-3-7-sonnet🥉Link324343353
4Gemini-2.0-proLink443455434
5Gemini-2.0-flashLink555532545
6LLaVA-OneVisionLink666668666
7DeepSeek-VL2Link77771061177
8LLaVA-NeXTLink89101077998
9Phi-4Link91099910889
10MiniCPM-o-2_6Link1088889101110
11Pixtral-12BLink11111111111171011
12InternVL2-8BLink121312131212131212
13Qwen2.5-VLLink131213121313121313

BibTeX

      
@misc{zhang2024benchmarking,
      title={MLA-Trust: Benchmarking Trustworthiness of Multimodal LLM Agents in GUI Environments}, 
      author={Xiao Yang, Jiawei Chen, Jun Luo, Zhengwei Fang, Yinpeng Dong, Hang Su, Jun Zhu},
      year={2025},
      eprint={2506.01616},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
    }