封面
市场调查报告书
商品编码
1882073

驾驶舱智能体工程(2025)

Cockpit Agent Engineering Research Report, 2025

出版日期: | 出版商: ResearchInChina | 英文 248 Pages | 商品交期: 最快1-2个工作天内

价格
简介目录

行动:最后一公里任务

自2023年基础模型安装到车辆以来,驾驶舱AI助理在每个阶段都承担着不同的任务。到2025年,驾驶舱AI助理将专注于执行特定行动。这意味着“帮助用户完成任务”,而不是“提供建议”,这标誌着从“助手”转变为“智能体”的重要一步。

2025年驾驶舱AI助理的一个典型应用场景是在餐厅点餐。

2024年,如果用户想点咖啡,驾驶舱AI助理只能在地图上搜寻附近的咖啡店,用户需要手动选择和导航。用户负责下单和付款,人工智慧助理不提供任何协助。

到 2025 年,当用户订购咖啡时,驾驶舱人工智慧助理将能够确认用户的意图,并自动完成一系列操作,包括下单和付款。这将消除用户的干扰,并优化用户体验。

整个过程涉及与长期记忆、工具调用和多智能体协作相关的技术。

案例 1:工具呼叫

2024 年初,OpenAI 的函数呼叫成为驾驶舱智能体呼叫工具的主流技术,实现了单一模型与单一工具之间的直接互动。

Anthropic 于 2024 年 11 月推出了模型上下文协定 (MCP),解决了基于函数呼叫的 "多元件协作" 问题,并提升了函数呼叫的应用场景和效率。

2025年4月,Google提出了A2A(Agent2Agent)协议,旨在进一步规范不同智能体之间的通讯与协作。

例如,李祥同学的2025年智能体应用方案就采用了MCP/A2A技术框架(CUA也可作为独立框架使用)。

MCP/A2A:IVI智能体作为多智能体系统(MAS)的领导者,将任务分配给第三方智能体,每个智能体完成各自的工作流程。

CUA(Cockpit Using Agent):作业系统呼叫多模态基础模型来理解、分解和规划指令/任务,产生最终动作,然后呼叫小程式或应用程式来完成指令/任务。 例如,在支付场景中,经过一系列理解和规划后,丽香同学呼叫API连接到支付宝的汽车助手,并透过支付宝生态系统使用相关小程式完成支付。

在训练过程中,丽香同学团队利用MCP管理工具服务优化智能体强化学习阶段的奖励模组。具体来说,我们使用MCP Hub提供与训练任务和业务需求相对应的可呼叫工具资源目录。

在下一阶段,我们计划增强多态性能力并实现动作链(COA)。这意味着同一个模型将持续推理如何呼叫外部工具、解决问题和执行动作,进一步加强不同模组在工具呼叫、推理和动作执行上的协作。

本报告研究分析了中国汽车产业,提供了有关座舱智能体现状、工程阶段和研发技术路线图以及主要OEM厂商智能体特征的资讯。

目录

定义

第一章:座舱代理的现况与发展趋势

  • 座舱代理概述
  • 座舱代理应用场景概述
  • 座舱代理的现况
  • 座舱代理的发展趋势

第二章:OEM代理解决方案

  • 座舱AI代理/AI助理概述(2025)
  • 座舱AI代理/AI助理概述(2025)
  • 力翔同学
  • 蔚来汽车
  • 小鹏汽车
  • 吉利汽车
  • 小米汽车
  • 长城汽车电机
  • 北汽
  • 上汽
  • 奇瑞
  • 其他

第三章 供应商代理解决方案

  • 华为
  • 阿里巴巴云
  • 百度云
  • 腾讯云
  • 位元组跳动 & 火山引擎
  • 商汤科技
  • 智浦人工智慧
  • 科大讯飞
  • Thundersoft
  • Kotei Agent
  • 联想
  • TINNOVE

第四章 实用代理技术

  • 意图识别
  • 知识图谱与搜寻
  • 情感识别
  • 推理加速
  • 推荐系统
  • 工具调用
  • 多智能体系统 (MAS)
  • 图形使用者介面 (GUI)智能体

第五章 智能体应用问题

  • 问题 1:边缘云部署中运算能力的平衡点
  • 问题 2:多智能体系统架构设计(1)
  • 问题 2:多智能体系统架构设计(2)
  • 问题 2:多智能体系统架构设计(3)- 基础模型选择
  • 问题 3:商业模式设计
  • 问题 4:场景应用有效性
  • 问题 5:训练偏差
  • 问题 6:资料隐私
简介目录
Product Code: GX020

Cockpit Agent Engineering Research: Breakthrough from Digital AI to Physical AI

Cockpit Agent Engineering Research Report, 2025 starts with the status quo of cockpit agents, summarizes the technical roadmap of the R&D and engineering stages and the characteristics of agents from leading OEMs, and predicts the future trends and priorities of cockpit agent application.

Action: Last Mile Mission

Since foundation models were installed in vehicles in 2023, cockpit AI assistants have assumed different tasks at different stage. In 2025, cockpit AI assistants focus on action, which means they "help users get things done" instead of "giving suggestions", marking an important step in the transformation from "assistants" to "agents".

One typical scenario for cockpit AI assistants in 2025 is ordering food at restaurants:

In 2024, when a user wanted to order coffee, a cockpit AI assistant could only find nearby coffee shops on the map for the user to manually select and navigate to, but the ordering and payment were all fulfilled by the user himself/herself while the AI assistant cannot help at all.

By 2025, when a user orders coffee, the cockpit AI assistant will be able to confirm his/her intention and automatically complete a series of operations such as ordering and payment, without the user having to worry about it, thus optimizing the user experience.

The entire process involves technologies related to long-term memory, tool calling, and multi-agent collaboration.

1.Case 1: Tool Calling

In early 2024, OPEN AI's Function Calling was the mainstream technology used by cockpit agents when calling tools, facilitating the direct interaction between a single model and a single tool.

The Model Context Protocol (MCP) introduced by Anthropic in November 2024 addresses the issue of "multi-component collaboration" on basis of Function Calling and improves the application scenarios and efficiency of Function Calling.

In April 2025, Google proposed the A2A (Agent2Agent) protocol to further standardize the communication and collaboration between different agents.

For example, the agent application solution of Lixiang Tongxue in 2025 includes a MCP/A2A technical framework (another framework is CUA):

MCP/A2A: The IVI agent acts as the leader of the multi-agent system (MAS), assigning tasks to third-party agents, which then complete their respective workflows.

CUA (Cockpit Using Agent): The operating system calls a multimodal foundation model to understand instructions/tasks, decomposes and plans them, generates the final action, and calls applets and apps to complete the instructions/tasks. For example, in the payment scenario, after a series of understandings and plans, Lixiang Tongxue calls the API to connect to Alipay's automotive assistant, and uses the relevant applet through Alipay's ecosystem to complete the payment.

During the training process, the team of Lixiang Tongxue uses the MCP management tool service in the Reward module optimization of the agent reinforcement phase, such as using MCP Hub to provide a catalog of callable tool resources for training tasks and business requests.

In the next phase, Lixiang Tongxue plans to strengthen its multimodal capabilities and implement COA (Chain of Action), which means that the same model continuously thinks about how to call external tools to solve problems and take action, further improving the synergy between different modules for tool calling, reasoning and action.

2.Case 2: GUI Agent

A GUI agent (graphical user interface agent) is a specific LLM agent which processes user commands or requests in natural language, understands the current state of the GUI through screenshots or UI element trees, and performs actions that simulate human-computer interaction, thus spanning various software interfaces.

A GUI agent typically includes modules such as operating environment, prompt project, model inference, action, and memory.

GUI agent technology is still far from fully mature, but some OEMs, including Li Auto, Geely, and Xiaomi, have already started to deploy it.

In the ordering scenario aforementioned, Lixiang Tongxue leverages GUI agent technology when selecting a meal package, so that it can operate the screen components automatically without user intervention. The team of Lixiang Tongxue has pointed out that the operation accuracy of the GUI agent will also affect the final action of the CUA framework (because the payment process requires scanning screenshots, which involves the GUI agent). If the accuracy is too low, it may be difficult to guarantee a stable experience for complex tasks such as registering for parking and paying parking fees.

For example, Xiaomi has launched a GUI agent framework "BTL-UI", which uses a Group Relative Policy Optimization (GRPO) algorithm within a Markov decision process (MDP). The agent should receive the current screen state, user commands, and historical interaction records at each time step, and then output a structured BTL response, converting the input multimodal information into a comprehensive output that includes visual attention zones, reasoning processes, and command execution.

Its implementation methods and core technologies include:

Bionic interaction framework: Based on the BTL-UI model, it simulates human visual attention allocation (blinking), logical reasoning (thinking), and precise execution (action), supporting complex multi-step tasks (such as cross-application calls and multimodal interactions).

Automated data generation: It automatically analyzes screenshots, identifies the interface elements most relevant to user commands, and generates high-quality attention annotations for these zones.

BTL reward mechanism: It meticulously evaluates each cognitive stage in between, checking whether the AI correctly identifies the relevant interface elements, performs reasonable logical reasoning, and generates accurate operation instructions.

OEMs are currently transitioning from L2 reasoners to L3 agents, with L3 further divided into four stages.

According to OPEN AI's definition of AGI, Chinese OEMs are currently in the process of transitioning from L2 reasoners to L3 agents. At each different stage, different problems should be solved, with corresponding characteristics:

At present, most OEMs' cockpit AI assistants have delivered "professional services" to a certain extent. The next goal is to achieve "emotional resonance" and overcome the hurdle of "proactive prediction".

For "emotional resonance", NIO offers "Nomi" as a leading player.

In 2025, most AI assistants' emotional chats are implemented primarily through tone changes simulated by TTS technology, terminology from the knowledge base (such as colloquial interjections), and preset emotional scenario workflows. Compared to other cockpit agents, Nomi has two unique advantages:

1.Physical shell: Nomi can materialize more than 200 dynamic expressions through its shell "Nomi Mate" (upgraded to version 3.0 as of November 2025), giving emotional value in the real world. For example, when Nomi interacts with people via voice, it simulates the head movements that occur when people are talking to each other, and simulates the movement of a person's head turning towards the source of a sound when they hear a sound, thus achieving an arc-shaped head turning trajectory.

2.Emotional settings:

In terms of architecture, a dedicated "emotion engine" module is set up. Through three sub-modules, namely "contextual intelligence", "personalized intelligence" and "emotional expression", it uses voice, vision and multimodal perception technologies to achieve contextual arbitration, derive a series of understandings of the current situation, and realize natural human-like reactions in emotional scenarios.

In terms of settings, Nomi can have a personality. Based on the settings, it can perform search associations through a streaming prediction model similar to GPT, exhibiting unique situational responses and providing a personalized experience for each user (such as simulating multiple MBTI personalities, in contrast to Lixiang Tongxue set as ENFJ).

After achieving "proactive prediction," cockpit agents make a breakthrough from digital AI to physical AI.

Starting from L3.5+, generalization has become one of the limiting factors for agents' ability to flexibly cope with multi-scenario tasks. To improve generalization in different scenarios, agents should not only learn policies (what actions to take in a certain state), but also know about dynamic environmental models (how the world will change after performing an action) to make predictions in direct interaction with the environment.

To avoid limitations caused by the shortage of high-quality data, one solution is to learn in a real physical environment to achieve a breakthrough from digital AI to physical AI.

For example, the team of Lixiang Tongxue has found that the effect of data on improving the model's capabilities decreases after using massive Internet data for training the base model, namely the marginal benefit of scaling law in model pre-training declines.

Therefore, the team of Lixiang Tongxue has changed the training method for the next stage. it will focus on the interaction between the model and the physical world. Through reinforcement learning, the model will judge the correctness of the thinking process and accumulate experience and data in the interaction with the environment.

Fei-Fei Li's team from World Labs has proposed "augmented interactive agents," which feature multimodal capabilities with "cross-reality-agnostic" integration and incorporate an emergent mechanism.

In training intelligent agents, Fei-Fei Li's team has introduced an "in-context prompt" or "implicit reward function" to capture key features of expert behavior. The intelligent agents can be trained by physical world behavior data learned from expert demonstrations for task execution. The data is collected by gathering expert demonstrations in the physical world in the form of "state-action pairs".

In 2025, most OEMs chose a multi-agent approach to build their cockpit AI systems. Multi-agent collaboration is also one of the ways to improve the generalization of agents. Through "domain specialization + scenario linkage + group learning", the generalization limitations of existing agents can be broken through from multiple dimensions.

For example, GAC's "Beibi" agent can recognize intent in complex scenarios through multi-agent collaboration based on foundation model intent recognition, tackling the problems of vertical agents like "lack of unified interaction entry and inefficient collaboration". It eliminates the need for users to operate multiple agents separately (such as adjusting navigation and air conditioning individually), thus improving collaboration efficiency. Its principles include:

Build the core intelligent agent: Fine-tune the pre-trained language model using a pre-set dataset related to automotive scenarios (such as vehicle control, navigation, and other instruction records) to obtain an intent recognition model. Then, build an "intent understanding intelligent agent" based on this model, while adding a caching service to improve response speed.

Parse user intent: Receive user commands (such as voice or touch commands), and infer the intent recognition result (including 1-3 intents and their corresponding confidence scores, e.g., "Find a gas station" confidence score 0.85, "Adjust temperature" confidence score 0.9) from the intent understanding agent, and cache the commands and results.

Call collaborative agents: Make collaborative decisions based on the current scenario (such as driving status, weather), call on target agents related to the intent (such as navigation and vehicle control agents) to work together, and receive the action results of each agent.

Arbitrate, feed back and enforce: Arbitrate based on historical confidence scores (the past success rate of the agents) and the current action result; arbitrate based on the intent recognition model when there are no historical scores, and finally feed back the result to the actuation system (such as the IVI or voice broadcast) to complete the operation.

Table of Contents

Definition

1 Status Quo and Trends of Cockpit Agents

  • 1.1 Overview of Cockpit Agents
  • Definition and Value
  • Functional Features and Workflow
  • Reference Architecture (1): Classic Module Design Applied
  • Reference Architecture (1): Derivative Module Design Applied
  • Reference Architecture (1): Derivative Module Design Applied: Functional Module Design Requirements (1)-(2)
  • Reference Architecture (2): Multi-Agent System Module Design
  • Reference Architecture (2): Multi-Agent System Module Design: Components and Their Functions
  • Reference Architecture (2): Multi-Agent System Module Design: Components and Their Features (1)-(8)
  • Reference Architecture (2): Multi-Agent System Module Design: Architecture Diagram
  • Reference Architecture (3): Agent Architecture Design: By Different Deployment Levels
  • Collaboration Mechanism between Cockpit Agents, LLMs and OS
  • 1.2 Overview of Cockpit Agent Scenarios
  • Classification of Cockpit Agent Application Scenarios (1)
  • Classification of Cockpit Agent Application Scenarios (2)
  • Typical Agent Scenarios (1): Workflow Decomposition of MAS in Mobility Scenarios (1)-(5)
  • Typical Agent Scenarios (2): Workflow Decomposition of MAS in Entertainment Scenarios (1)-(4)
  • Typical Agent Scenarios (3): Workflow Decomposition of MAS in Children Scenarios (1)-(2)
  • Typical Agent Scenarios (4): Workflow Decomposition of MAS in Emotional Scenarios (1)-(2)
  • Typical Agent Scenarios (5): Workflow Decomposition of MAS in Q&A Scenarios (1)-(2)
  • Typical Agent Scenarios (6): Workflow Decomposition of MAS in Education Scenarios (1)-(2)
  • Typical Agent Scenarios (7): Workflow Decomposition of MAS in Parking Scenarios (1)-(3)
  • Typical Agent Scenarios (8): Workflow Decomposition of MAS in Shopping Scenarios
  • Typical Agent Scenarios (9): Workflow Decomposition of MAS in Medical Scenarios
  • Typical Agent Scenarios (10): Workflow Decomposition of MAS in Office Scenarios (1)-(2)
  • Agent Scenario Cases (1)
  • Agent Scenario Cases (2)
  • Agent Scenario Cases (3)
  • 1.3 Status Quo of Cockpit Agents
  • Development History of Agents
  • OEM Agent Comparison
  • Comparison of Three Development Models for Automotive AI Agents: Advantages/Disadvantages
  • Comparison of Three Development Models for Automotive AI Agents: Cost
  • 1.4 Development Trends of Cockpit Agents
  • 5 Levels of AGI: Main Application Issues
  • Four Stages of Cockpit Agent Iteration
  • Agent Trends (1)
  • Agent Trends (2)
  • Agent Trends (3)
  • Agent Trends (4)
  • Agent Trends (5)
  • Agent Trends (5): Cases
  • Agent Trends (6): Key Goals of L3.5+ Agents: High-Frequency Emergence
  • Agent Trends (6): Key Goals of L3.5+ Agents: Emergent Technology Foundation
  • Agent Trends (6): Key Goals of L3.5+ Agents: Typical Emergent Scenarios
  • Agents with Emergent Capabilities (1): Interactive Agents
  • Agents with Emergent Capabilities (2): "Emergence" Mechanisms of Interactive Agents
  • Agents with Emergent Capabilities (3): Training Methods of Interactive Agents
  • Agents with Emergent Capabilities (4):
  • Agents with Emergent Capabilities (5): Two Strategies Accelerate "High-Level Emergence"
  • Agents with Emergent Capabilities (6):

2 OEM Agent Solutions

  • Overview Diagram of Cockpit AI Agents/AI Assistants in 2025
  • Overview Table of Cockpit AI Agent/AI Assistant in 2025
  • 2.1 Lixiang Tongxue
  • Upgrade to Agent
  • Ordering Scenario Analysis
  • Payment Scenario Analysis
  • Agent Architecture: Two Paths
  • R&D Insights (1): Focus of Agent Performance Improvement
  • R&D Insights (2): Planning 2.0
  • R&D Insights (3): Interactive Scenario Design and Evaluation
  • Functional Module Diagram
  • Underlying capabilities: Base Model Performance Improvement
  • Underlying Capabilities: The Base Model Adds Agent Task Training (1)-(4)
  • Underlying Capabilities: Different Paths to Enhance Base Model Capabilities (1)-(6)
  • Underlying Capabilities: Base Model Engineering Capability Optimization Solution (1)-(6)
  • Underlying Capabilities: Base Model Engineering Capability Optimization Solution - Training Platform
  • Underlying Capabilities: Base Model Engineering Capability Optimization Solution - Inference Engine
  • Underlying Capabilities: From Models to CUA
  • Underlying Capabilities: Base Model Agent Capability Enhancement Solution
  • Underlying Capabilities: Full-Modal Foundation Models
  • Underlying Capabilities: Application Scenarios of Full-Modal Foundation Models (1): Speech Knowledge Q&A
  • Underlying Capabilities: Application Scenarios of Full-Modal Foundation Models (2)
  • Underlying Capabilities: Application Scenarios of Full-Modal Foundation Models (3)
  • Underlying Capabilities: Model Capability Assessment of Full-Modal Foundation Models (1)-(2)
  • Underlying Capabilities: Tool Capability Assessment of Full-Modal Foundation Models (1)-(2)
  • 2.2 NIO
  • NomiGPT and NomiAgent Deployment Architecture
  • Functional Modules of NomiGPT
  • Functions of NomiGPT (1): Multimodal Perception
  • Functions of NomiGPT (2): Command Distribution
  • Functions of NomiGPT (3):
  • Highlights of NomiGPT (1): EAI (1)-(2)
  • Highlights of NomiGPT(2): Emotional Interaction (1)-(4)
  • Highlights of NomiGPT (3)
  • 2.3 Xpeng
  • Cockpit Focuses on Edge AI
  • Cockpit AI Functions and Planning
  • 2.4 Geely
  • 5-Layer Architecture of Geely Agent System
  • OS Functional Features of Geely Agent (1)-(3)
  • Functions of Galaxy M9
  • Architecture of ZEEKR Agent
  • ZEEKR Cockpit Agent Scenarios (1): Life Services
  • ZEEKR Cockpit Agent Scenarios (2)
  • 2.5 Xiaomi
  • Application Scenarios of "XiaoAi Tongxue"
  • Architecture of "XiaoAi Tongxue"
  • GUI Agent Technology (1)
  • GUI Agent Technology (2)
  • 2.6 Great Wall Motor
  • Coffee Agent System (1): Application Scenarios
  • Coffee Agent System (2): Built on AI OS
  • Coffee Agent System: Cooperation Dynamics
  • 2.7 BAIC
  • Agent Platform Architecture: Baimo Huichuang
  • Agent Architecture
  • 2.8 SAIC
  • IM Introduces Alibaba Agent System (1): Functions
  • IM Introduces Alibaba Agent System (2): Features
  • Roewe Intelligent Assistant Base: Doubao
  • 2.9 Chery
  • Agent Brain System
  • Agent System Cooperation and Planning
  • 2.10 Others
  • Functions of GAC Agent
  • Cooperation Dynamics of BYD Agent

3 Supplier Agent Solutions

  • 3.1 Huawei
  • Agent System
  • HarmonySpace 5: MoLA
  • Agent Underlying Capabilities: LLM Architecture
  • Agent Underlying Capabilities: Multimodal Capabilities
  • Agent Underlying Capabilities: Thinking Capabilities
  • Xiaoyi Voice Technology
  • 3.2 Alibaba Cloud
  • Product System
  • Model Studio Supports Agent Construction
  • 3.3 Baidu Cloud
  • Product System
  • Multi-Agent Collaboration Mode
  • 3.4 Tencent Cloud
  • Product system
  • Cockpit System Upgrade of TAI 6.0 (1)
  • Cockpit System Upgrade of TAI 6.0 (2)
  • Tencent (Inference Service Solution)
  • Tencent (Generation Scenario Solution)
  • Q&A Scenario Solution
  • 3.5 ByteDance & Volcano Engine
  • Doubao Model System
  • Volcano Engine Cockpit Function Highlights
  • 3.6 SenseTime
  • Foundation Model System
  • Model Layout
  • Cockpit AI Product System
  • Foundation Model Training Facility
  • Customers
  • 3.7 Zhipu AI
  • Agent Evolves to LLM OS
  • Agent Architecture
  • Product System
  • Agent Model
  • Automotive Foundation Model Base
  • Technical Highlights
  • 3.8 iFLYTEK
  • Product system
  • Functional and Technical Highlights
  • 3.9 Thundersoft
  • Agent System Is Built Based on Aqua Drive OS
  • Agent Dynamics
  • 3.10 Kotei Agent
  • 3.11 Lenovo
  • Agent Architecture
  • 3.12 TINNOVE
  • Agent Architecture
  • AI System Service Forms
  • AI System Application Scenarios

4 Agent Practical Technology

  • 4.1 Intent Recognition
  • Cases (1)
  • Cases (2)
  • 4.2 Knowledge Graph and Search
  • Cases (1)
  • Cases (2)
  • Cases (3)
  • 4.3 Emotion Recognition
  • Cases (1)
  • Cases (2)
  • Cases (3)
  • 4.4 Inference Acceleration
  • Cases (1)
  • Cases (2)
  • 4.5 Recommendation System
  • Problems
  • Patent Technology: Refueling Recommendation
  • Cases (1)
  • Cases (2)
  • 4.6 Tool Calling
  • Synergy and Differences between Function Calling, MCP and A2A
  • MCP Application Cases (1)
  • MCP Application Cases (2)
  • A2A Application Cases
  • 4.7 MAS
  • Cases (1): Great Wall Motor
  • Cases (2)
  • 4.8 GUI Agent
  • Principle
  • Application
  • Functions and Features (1)-(4)

5 Problems in Agent Application

  • Problem 1: The Computing Power Balance Point of the "Edge-Cloud" Deployment
  • Problem 2: Architecture Design of Multi-Agent Systems (1)
  • Problem 2: Architecture Design of Multi-Agent Systems (2)
  • Problem 2: Architecture Design of Multi-Agent Systems (3) - Base Model Selection
  • Solution: "Intramodal Adaptive Fusion + Cross-modal Precise Interaction" are the Optimal Path for Multimodal Tasks
  • Case: How Anthropic Designs MAS (1)
  • Case: How Anthropic Designs MAS (2)
  • Problem 3: Business Model Design
  • OEM Agent Profit Model: Cost Types
  • OEM Agent Profit Model: 7 Cost Recovery Mechanisms at the Current Stage
  • Lixiang Tongxue Builds IP Ecosystem: Physical Goods
  • Lixiang Tongxue Builds IP Ecosystem: Virtual Products
  • NIO's Peripheral Product Sales
  • Aion Beibi IP Peripheral Products
  • OEM Agent Profit Model: Future Profit Methods
  • Cockpit Agent Business Model from the Perspective of OEMs: Tiered Payment
  • Problem 4: Effectiveness of Scenario Application
  • Problem 5: Training Bias
  • Problem 6: Data Privacy