首页 > 市场调查报告书 > 汽车工业

自动化

市场调查报告书

商品编码

1777128

汽车和机器人工学的VLA大规模模式的应用(2025年)

VLA Large Model Applications in Automotive and Robotics Research Report, 2025

出版日期: 2025年07月30日 | 出版商:

ResearchInChina | 英文 300 Pages | 商品交期: 最快1-2个工作天内

价格

简介目录

2023 年 7 月，Google DeepMind 发布了基于 VLA 架构的 RT-2 模型。该模型透过融合大规模语言模型和多模态资料学习，赋予机器人执行复杂任务的能力。其任务准确率约为第一代模型（32%-62%）的两倍，并在垃圾分类等场景中实现了突破性的零样本学习。

VLA 理念迅速引起车企的关注，并在智慧驾驶领域迅速落地。如果说 "端到端" 是2024年智慧驾驶领域最热词，那么 "虚拟自动驾驶" （VLA）预计将成为2025年最热词。小鹏汽车、理想汽车、深路智行等公司都已公布VLA解决方案。

小鹏汽车在7月发布G7车型时，率先宣布量产车载VLA。理想汽车计画为其i8车型搭载VLA，预计于7月29日的发表会上亮相。吉利、DeepRoute.ai 和 iMotion 等公司也在开发 VLA。

理想汽车和小鹏汽车分别展示了 VLA 模型在汽车上的应用方案，其中，蒸馏或强化学习是优先考虑的。

在小鹏汽车 G7 预售会上，何小鹏用大脑和小脑的比喻来解释传统端到端和 VLA 的功能。他表示，传统的端到端解决方案就像小脑，“让汽车开得动”，而搭载大规模语言模型的 VLA 就像大脑，“让汽车开得好”。

小鹏汽车和理想汽车在 VLA 的应用路径略有不同。理想汽车首先蒸馏一个大规模的云端模型，然后在蒸馏后的端到端模型上进行强化学习。小鹏汽车首先在云端大规模模型上进行强化学习，然后进行模式蒸馏到车端。

2025年5月，李想在AI Talk上表示，理想汽车云端基础模型有320亿参数，并将32亿参数模型提取到车端。透过驾驶场景资料进行后训练和强化学习。第四阶段，将最终的驾驶代理部署到端和云端。

小鹏汽车也将VLA模型训练部署工厂划分为四个车间。第一车间负责基础模型的预训练和后训练，第二车间负责模型蒸馏，第三车间继续对蒸馏后的模型进行预训练，第四车间将XVLA部署到车端。小鹏全球基础模型负责人刘先明博士表示，小鹏汽车已在云端训练了包括10亿、30亿、70亿、720亿多个参数的小鹏全球基础模型。

究竟哪种方案更适合更智慧的驾驶环境，将根据各厂商的VLA方案应用到车辆上后的具体表现来判断。

近日，麦基尔大学、清华大学、小米集团和威斯康辛大学麦迪逊分校联合发表的 "自动驾驶视觉-语言-动作模型综述" （A Survey on Vision-Language-Action Models for Autonomous Driving）对自动驾驶领域的VLA模型进行了全面的综述。文将VLA的发展分为四个阶段：Pre-VLA（以VLM为解释角色）、Modular VLA、End-to-End VLA和Augmented VLA。清楚阐述了VLA各阶段的特征及其逐步发展的过程。

本报告研究了汽车和机器人领域中的大规模VLA模型，总结了它们的技术起源、发展阶段、应用实例和核心特性。报告列举了智慧驾驶和机器人领域中八种典型的VLA实现方案和代表性的大规模VLA模型，并总结了VLA发展的四大趋势。

VLA（视觉、语言和行为建模）的基本定义
VLA 科技的起源与发展
大规模 VLA 建模方法的分类
自动驾驶 VLA 模型开发的四个阶段
VLA 解决方案的应用（1）
VLA 解决方案的应用（2）
VLA 解决方案的应用（3）
VLA 解决方案的应用（4）
案例 1：增强 VLA 泛化能力
案例 2：VLA 计算开销
VLA 的核心特性
VLA 技术开发面临的课题

第2章 VLA的技术架构，解决方案，趋势

VLA 核心技术架构分析 (1)
VLA 核心技术架构分析 (2)
VLA 核心技术架构分析 (3)
VLA 核心技术架构分析 (4)
VLA 核心技术架构分析 (5)
VLA 核心技术架构分析 (6)
VLA 核心技术架构分析 (7)
VLA 决策的核心—思维链 (CoT) 技术
VLA 大规模模型实作方案概述
VLA 实作方案 (1)：基于传统 Transformer 架构的方案
VLA 实现方案 (2)：基于预训练 LLM/VLM 的方案
VLA 实现方案 (3)：基于扩散 (Diffusion) 的方案模型
VLA 实现方案 (4)：LLM + 扩散模型方案
VLA 实现方案 (5)：影片产生 + 逆运动学方案
VLA 实作方案 (6)：明确端对端 VLA 方案
VLA 实作方案 (7)：隐式端对端 VLA 方案
VLA 实作方案 (8)：分层端对端 VLA 方案
智慧驾驶 VLA 模型概述
具身化人工智慧 VLA 模型概述
案例1
案例2
案例3
案例4
VLA开发趋势(1)
VLA开发趋势(2)
VLA开发趋势(3)
VLA开发趋势(4)

第3章汽车领域的VLA大规模模式的应用

Li Auto
XPeng Motors
Chery Automobile
Geely
Xiaomi Auto
DeepRoute.ai
Baidu Apollo
Horizon Robotics
SenseTime
NVIDIA
iMotion
XPeng Motors
Chery Automobile
Geely
Xiaomi Auto
DeepRoute.ai
Baidu Apollo
Horizon Robotics
SenseTime
NVIDIA
iMotion

第4章机器人工学领域的大规模模式的进步

机器人一般的基本模式
机器人多模态大规模模式
机器人资料一般化模式
机器人大规模模式资料集
机器人VLM模式
机器人VLN模式
机器人VLA模式
机器人世界模式

第5章机器人工学领域的VLA的应用案例

AgiBot
Galbot
Robot Era
Estun
Unitree
UBTECH
Tesla Optimus
Figure AI
Apptronik
Agility Robotics
XPeng IRON
Xiaomi CyberOne
GAC GoMate
Mornine
Leju Robotics
LimX Dynamics
AI2 Robotics
X Square Robot

简介目录

Product Code: ZXF013

ResearchInChina releases "VLA Large Model Applications in Automotive and Robotics Research Report, 2025"

The report summarizes and analyzes the technical origin, development stages, application cases and core characteristics of VLA large models.

It sorts out 8 typical VLA implementation solutions, as well as typical VLA large models in the fields of intelligent driving and robotics, and summarizes 4 major trends in VLA development.

It analyzes the VLA application solutions in the field of intelligent driving of companies such as Li Auto, XPeng Motors, Chery Automobile, Geely Automobile, Xiaomi Auto, DeepRoute.ai, Baidu, Horizon Robotics, SenseTime, NVIDIA, and iMotion.

It sorts out more than 40 large model frameworks or solutions such as robot general basic models, multimodal large models, data generalization models, VLM models, VLN models, VLA models and robot world models.

It analyzes the large models and VLA large model application solutions of companies such as AgiBot, Galbot, Robot Era, Estun, Unitree, UBTECH, Tesla Optimus, Figure AI, Apptronik, Agility Robotics, XPeng IRON, Xiaomi CyberOne, GAC GoMate, Chery Mornine, Leju Robotics, LimX Dynamics, AI2 Robotics, and X Square Robot.

Vision-Language-Action (VLA) model is an end-to-end artificial intelligence model that integrates three modalities: Vision, Language, and Action. Through a unified multimodal learning framework, it integrates perception, reasoning and control, and directly generates executable physical world actions (such as robot joint movement, vehicle steering control) based on visual input (such as images, videos) and language instructions (such as task descriptions).

In July 2023, Google DeepMind launched the RT-2 model, which adopts the VLA architecture. By integrating large language models with multimodal data training, it endows robots with the ability to perform complex tasks. Its task accuracy has nearly doubled compared with the first-generation model (from 32% to 62%), and it has achieved breakthrough zero-shot learning in scenarios such as garbage classification.

The concept of VLA was quickly noticed by automobile companies and rapidly applied to the field of automotive intelligent driving. If "end-to-end" was the hottest term in the intelligent driving field in 2024, then "VLA" will be the one in 2025. Companies such as XPeng Motors, Li Auto, and DeepRoute.ai have released their respective VLA solutions.

When XPeng Motors released the G7 model in July, it took the lead in announcing the mass production of VLA in vehicles. Li Auto plans to equip the i8 model with VLA, which is expected to be revealed at the press conference on July 29. Enterprises such as Geely Automobile, DeepRoute.ai and iMotion are also developing VLA.

Li Auto and XPeng Motors have given different solutions on whether VLA models should be distilled first or reinforced learning first when applied in vehicles

At the pre-sale conference of XPeng Motors' G7, He Xiaopeng used the brain and cerebellum as metaphors to explain the functions of the traditional end-to-end and VLA. He said that traditional end-to-end solution plays the role of cerebellum, "making the car able to drive", while VLA introduces a large language model, playing the role of brain, "making the car drive well".

XPeng Motors and Li Auto have taken slightly different routes in VLA application: Li Auto first distills the cloud-based base large model, and then performs reinforcement learning on the distilled end-side model; XPeng Motors first performs reinforcement learning on the cloud-based base large model, and then distills it to the vehicle end.

In May 2025, Li Xiang mentioned in AI Talk that Li Auto's cloud-based base model has 32 billion parameters, distills a 3.2 billion parameter model to the vehicle end, and then conducts post-training and reinforcement learning through driving scenario data, and will deploy the final driver Agent on the end and cloud in the fourth stage.

XPeng Motors has also divided the factory for training and deploying VLA models into four workshops: the first workshop is responsible for pre-training and post-training of the base model; the second workshop is responsible for model distillation; the third workshop continues pre-training the distilled model; the fourth workshop deploys XVLA to the vehicle end. Dr. Liu Xianming, head of XPeng's world base model, said that XPeng Motors has trained "XPeng World Base Models" with multiple parameters such as 1 billion, 3 billion, 7 billion, and 72 billion in the cloud.

Which solution is more suitable for the intelligent driving environment remains to be seen based on the specific performance of different manufacturers' VLA solutions after being applied in vehicles.

Recently, research teams from McGill University, Tsinghua University, Xiaomi Corporation, and the University of Wisconsin-Madison jointly released a comprehensive review article on VLA models in the field of autonomous driving, "A Survey on Vision-Language-Action Models for Autonomous Driving". The article divides the development of VLA into four stages: Pre-VLA (VLM as explainer), Modular VLA, End-to-end VLA and Augmented VLA, clearly showing the characteristics of VLA in different stages and the gradual development process of VLA.

There are over 100 robot VLA models, constantly exploring in different paths

Compared with the application of VLA large models in automobiles, which have tens of billions of parameters and nearly 1,000 TOPS of computing power, AI computing chips in the robotics field are still optional, and the number of parameters in training data sets is mostly between 1 million and 3 million. There are also controversies over the mixed use of real data and simulated synthetic data and routes. One of the reasons is that the number of cars on the road is hundreds of millions, while the number of actually deployed robots is very small; another important reason is that robot VLA models focus on the exploration of the microcosmic world. Compared with the grand automotive world model, the multimodal perception of robot application scenarios is richer, the execution actions are more complex, and the sensor data is more microscopic.

There are more than 100 VLA models and related data sets in the robotics field, and new papers are constantly emerging, with various teams exploring in different paths.

Exploration 1: VTLA framework integrating tactile perception

In May 2025, research teams from the Institute of Automation of the Chinese Academy of Sciences, Samsung Beijing Research Institute, Beijing Academy of Artificial Intelligence (BAAI), and the University of Wisconsin-Madison jointly released a paper on VTLA related to insertion manipulation tasks. The research shows that the integration of visual and tactile perception is crucial for robots to perform tasks with high precision requirements when performing contact-intensive operation tasks. By integrating visual, tactile and language inputs, combined with a time enhancement module and a preference learning strategy, VTLA has shown better performance than traditional imitation learning methods and single-modal models in contact-intensive insertion tasks.

Exploration 2: VLA model supporting multi-robot collaborative operation

In February 2025, Figure AI released the Helix general Embodied AI model. Helix can run collaboratively on humanoid robots, enabling two robots to cooperate to solve a shared, long-term operation task. In the video demonstrated at the press conference, Figure AI's robots showed a smooth collaborative mode in the operation of placing fruits: the robot on the left pulled the fruit basin over, the robot on the right put the fruits in, and then the robot on the left put the fruit basin back to its original position.

Figure AI emphasized that this is only touching "the surface of possibilities", and the company is eager to see what happens when Helix is scaled up 1000 times. Figure AI introduced that Helix can run completely on embedded low-power GPUs and can be commercially deployed immediately.

Exploration 3: Offline end-side VLA model in the robotics field

In June 2025, Google released Gemini Robotics On-Device, a VLA multimodal large model that can run locally offline on embodied robots. The model can simultaneously process visual input, natural language instructions, and action output. It can maintain stable operation even in an environment without a network.

It is particularly worth noting that the model has strong adaptability and versatility. Google pointed out that Gemini Robotics On-Device is the first robot VLA model that opens the fine-tuning function to developers, enabling developers to conduct personalized training on the model according to their specific needs and application scenarios.

VLA robots have been applied in a large number of automobile factories

When the macro world model of automobiles is integrated with the micro world model of robots, the real era of Embodied AI will come.

When Embodied AI enters the stage of VLA development, automobile enterprises have natural first-mover advantages. Tesla Optimus, XPeng Iron, and Xiaomi CyberOne robots have fully learned from their rich experience in intelligent driving, sensor technology, machine vision and other fields, and integrated their technical accumulation in the field of intelligent driving. XPeng Iron robot is equipped with XPeng Motors' AI Hawkeye vision system, end-to-end large model, Tianji AIOS and Turing AI chip.

At the same time, automobile factories are currently the main application scenarios for robots. Tesla Optimus robots are currently mainly used in Tesla's battery workshops. Apptronik cooperates with Mercedes-Benz, and Apollo robots enter Mercedes-Benz factories to participate in car manufacturing, with tasks including handling, assembly and other physical work. At the model level, Apptronik has established a strategic cooperation with Google DeepMind, and Apollo has integrated Google's Gemini Robotics VLA large model.

On July 18, UBTECH released the hot-swappable autonomous battery replacement system for the humanoid robot Walker S2, which enables Walker S2 to achieve 3-minute autonomous battery replacement without manual intervention.

According to public reports, many car companies including Tesla, BMW, Mercedes-Benz, BYD, Geely Zeekr, Dongfeng Liuzhou Motor, Audi FAW, FAW Hongqi, SAIC-GM, NIO, XPeng, Xiaomi, and BAIC Off-Road Vehicle have deployed humanoid robots in their automobile factories. Humanoid robots such as Figure AI, Apptronik, UBTECH, AI2 Robotics, and Leju are widely used in various links such as automobile and parts production and assembly, logistics and transportation, equipment inspection, and factory operation and maintenance. In the near future, AI robots will be the main "labor force" in "unmanned factories".

Related Definitions

Chapter 1 Overview of VLA Large Models

1.1 Basic Definition of VLA (Vision-Language-Action Model)
1.2 Origin and Evolution of VLA Technology
1.3 Classification of VLA Large Model Methods
1.4 Four Stages of VLA Model Development in Autonomous Driving
1.5 VLA Solution Application (1)
1.5 VLA Solution Application (2)
1.5 VLA Solution Application (3)
1.5 VLA Solution Application (4)
1.6 Case 1: Enhancement of VLA Generalization
1.6 Case 2: VLA Computational Overhead
1.7 Core Characteristics of VLA
1.8 Challenges in VLA Technology Development

Chapter 2 VLA Technical Architecture, Solutions and Trends

2.1 Analysis of VLA Core Technical Architecture (1)
2.1 Analysis of VLA Core Technical Architecture (2)
2.1 Analysis of VLA Core Technical Architecture (3)
2.1 Analysis of VLA Core Technical Architecture (4)
2.1 Analysis of VLA Core Technical Architecture (5)
2.1 Analysis of VLA Core Technical Architecture (6)
2.1 Analysis of VLA Core Technical Architecture (7)
2.2 VLA Decision Core - Chain-of-Thought (CoT) Technology
2.3 Overview of VLA Large Model Implementation Solutions
2.4 VLA Implementation Solution (1): Solution Based on Classic Transformer Structure
2.4 VLA Implementation Solution (2): Solution Based on Pre-trained LLM/VLM
2.4 VLA Implementation Solution (3): Solution Based on Diffusion Model
2.4 VLA Implementation Solution (4): LLM + Diffusion Model Solution
2.4 VLA Implementation Solution (5): Video Generation + Inverse Kinematics Solution
2.4 VLA Implementation Solution (6): Explicit End-to-End VLA Solution
2.4 VLA Implementation Solution (7): Implicit End-to-End VLA Solution
2.4 VLA Implementation Solution (8): Hierarchical End-to-End VLA Solution
2.5 Summary of Intelligent Driving VLA Models
2.6 Summary of Embodied AI VLA Models
2.7 Case 1
2.7 Case 2
2.7 Case 3
2.7 Case 4
2.8 VLA Development Trend (1)
2.8 VLA Development Trend (2)
2.8 VLA Development Trend (3)
2.8 VLA Development Trend (4)

Chapter 3 VLA Large Model Application in the Automotive Field

3.1 Li Auto
AI-based Autonomous Driving Development Plan
AI Application of Data Closed Loop: Cloud Training of Data
Overall Technical Architecture of End-to-End Solution
Technical Architecture of End-to-End Solution: System 1 - E2E (End-to-End)
Technical Architecture of End-to-End Solution: System 2 - VLM (Vision-Language Model)
Technical Architecture of End-to-End Solution: Cloud "World Model"
Self-developed MindVLA Based on End-to-End + VLM Dual-System Architecture
MindVLA Technical Architecture: Multimodal Perception Layer
MindVLA Technical Architecture: Semantic Understanding Layer
MindVLA Technical Architecture: Decision and Execution Layer
MindVLA: Cloud "World Model"
MindVLA: Four Stages of Training and Reasoning Process of VLA Driver Large Model
NVIDIA's End-to-End Technology Supports the Implementation of MindVLA
Application Scenarios and Functions of MindVLA
3.2 XPeng Motors
XPeng G7 Ultra Released, VLA Large Model Applied in Vehicles
VLA Large Model: Target to Achieve 10 Times End-to-End Intelligent Driving Capability
VLA OL Large Model: Brain + Cerebellum
Cloud Model Factory
World Base Model (1)
World Base Model (2)
World Base Model (3)
World Base Model (4)
World Base Model (5)
World Base Model (6)
3.3 Chery Automobile
AI Strategy (1)
AI Strategy (2)
ZDrive.AI Intelligent Driving Technology Evolution Route and Product Plan
ZDrive.AI to Realize L3/4 Product Application with VLA Large Model in 2027
VLA Large Model Based on One Model End-to-End
Embodied AI Platform - VLA Model
New Generation Intelligent Driving System Falcon 900, Built with VLA + World Model
Falcon Intelligent Driving Large Model Architecture
3.4 Geely
AI Strategy
High-Level Intelligent Driving System
Application of Qianli Haohan H9 Solution: VLA Vehicle-End AI Large Model
Integration of VLA Model, World Model and AI Drive Large Model to Build a Pan-World Model System
3.5 Xiaomi Auto
Orion Solution Framework
Orion's QT-Former
Physical World Modeling Framework
Dual-Track Layout of Physical Modeling and VLA
3.6 DeepRoute.ai
High-Level Intelligent Driving Platform: DeepRoute IO
End-to-End Model Intelligent Driving Platform: DeepRoute IO 1.0
VLA Model Intelligent Driving Platform: DeepRoute IO 2.0
Comparison of VLM & VLA Intelligent Driving Solutions
VLA Model Architecture
Advantages and Challenges of VLA Model
VLA Model Cooperation Dynamics
3.7 Baidu Apollo
Open Source End-to-End Autonomous Driving System AIR ApolloFM
Core Modules of AIR ApolloFM (1)
Core Modules of AIR ApolloFM (2)
Reference Engineering Design of AIR ApolloFM
Real Vehicle Operation Results of AIR ApolloFM
3.8 Horizon Robotics
End-to-End VLA Intelligent Driving System (1)
End-to-End VLA Intelligent Driving System (2)
Journey 6P Supports VLM/VLA and Other Technologies
Prediction of Fully Autonomous Driving by 2035
3.9 SenseTime
Launch of End-to-End VLA Modeling Framework SOLAMI
Overall Framework of SOLAMI
Training Process of SOLAMI
Multimodal Interaction Data Flow and Examples of SOLAMI
SOLAMI VR Interaction System Architecture
3.10 NVIDIA
Robot General VLA Large Model GR00T-N1 (1)
Robot General VLA Large Model GR00T-N1 (2)
Robot General VLA Large Model GR00T-N1 (3)
CoT-VLA Model Achieves Precise Control of Complex Tasks with "Visual Chain of Thought"
3.11 iMotion
VLA Intelligent Driving Solution

Chapter 4 Progress of Large Models in the Robotics Field

4.1 General Basic Models for Robots
Architecture of Robot Basic Large Models
General Basic Large Models
Robot General Large Model (1): Pi Zero
Robot General Large Model (2): Large Language Model Based on LLaMA
Robot General Large Model (3): Large Model Based on Vision Transformer
Key Technologies of Robot Driven by Large Models
Robot Perception Module
Robot Planning Module
Robot Decision Module
Robot Action Module
Robot Motion Control Module
Robot Feedback Module
Beijing Academy of Artificial Intelligence (BAAI) Open Source RoboBrain 2.0
MindLoongGPT
4.2 Robot Multimodal Large Models
Robot Multimodal Large Models
Visual Generation Large Models
SenseTime SenseNova V6 Large Model
Manycore Tech SpatialLM
4.3 Robot Data Generalization Models
Data-Driven Robot Imitation Learning
Real2Sim in RSR: Pure Vision, Low-Cost, Zero Manual Annotation Truth Production Process
UnrealZoo: Enriching Realistic Virtual Worlds for Embodied Intelligence Based on Unreal Engine
RoboTwin: Dual-Arm Robot Benchmark for Generative Digital Twins
RoboGSim: Data Synthesizer and Closed-Loop Simulator for Real2Sim2Real Paradigm
Any-point Trajectory Model (ATM) Framework
Peking University and Renmin University Teams Release Million-Scale Dataset to Build Humanoid Robot General Large Model
MotionLib Large-Scale Action Generation: From Language to Action
4.4 Robot Large Model Datasets
AgiBot World
Unitree G1 Dataset
Shanghai Jiao Tong University RH20T
Beijing Innovation Center of Humanoid Robotics RoboMIND
4.5 Robot VLM Models
Vision-Language Model VLM
General Robot Model Pi zero
PaLM-E: Embodied Multimodal Language Model
Figure AI Cooperates with OpenAI to Launch Three-Level Hierarchical Decision-Making Scheme
Noematrix Brain
Galbot Three-Level Large Model System
4.6 Robot VLN Models
Basic Concept of VLN
Main Implementation Methods of VLN
Comparison of VLA and VLN Models
LH-VLN: Vision-Language Navigation with a Long-Term Perspective: Platform, Benchmark and Methods
Safe-VLN: Anti-Collision for Autonomous Robot Vision and Language Navigation in Continuous Environments
MC-GPT: Enhancing Vision and Language Navigation Through Memory Graphs and Reasoning Chains
4.7 Robot VLA Models
Composition of Typical Robot VLA Models
NaVILA: Vision-Language-Action Model for Legged Robot Navigation
OpenVLA: Open Source Vision-Language-Action Model
OpenVLA: End-to-End Training - Vision-Language Model VLM
Vision-Language-Action (VLA) Model - Robotic Transformer2 (RT-2)
Uni-NaVid Proposes Unified Video-Language-Action (VLA) Model for Multiple Embodied Navigation Tasks
QUAR-VLA: Vision-Language-Action (VLA) Model for Quadruped Robots
RoboMamba: End-to-End VLA Model with 3 Times Faster Reasoning Speed, Only Needing to Adjust 0.1% of Parameters
LeVERB: VLA Framework for Zero-Shot Deployment Trained on Simulated Data
Google Gemini Robotics On-Device: Launching Locally Deployed Robot VLA Models
4.8 Robot World Models
Basic Architecture of World Models
Key Definitions and Application Development of World Models
AgiBot Jointly with Shanghai AI Lab Proposes Embodied 4D World Model EnerVerse
3D-VLA: A Three-Dimensional Vision-Language-Action Generation World Model
RoboDreamer: Compositional World Model for Learning Robot Imagination
IRASim - World Model in Robots
Robotic World Model: Neural Network Simulator for Robot Robust Strategy Optimization
DAMO Academy Releases "World VLA" Large Model WorldVLA