China’s automakers race toward large AI models for assisted driving

The integration of artificial intelligence into assisted driving systems is accelerating, reflected in the rising parameter counts of on-vehicle AI models that now approach the scale of mainstream large models.

Case in point: according to 36Kr, Xpeng Motors’ upcoming on-vehicle AI model will feature at least seven billion parameters. Li Auto is expected to reach a similar scale once it begins installing its self-developed chip next year.

Xpeng’s on-vehicle AI model is distilled from its in-house, cloud-based world foundation model, a design meant to address the compute, memory, and bandwidth limitations of vehicle chips that prevent deployment of full-scale models.

In the second half of 2024, Xpeng began transitioning toward large-scale cloud models. The company is developing an ultra large autonomous driving model with at least 72 billion parameters, slated for official release next month.

During a tech sharing session in April, Xpeng disclosed that its cloud model is built on a large language model backbone trained on extensive multimodal driving data. It reportedly interprets visual information, performs reasoning, and generates driving actions.

Once training is complete in the cloud, Xpeng plans to use knowledge distillation to compress the model, preserving key capabilities while creating a smaller version that can operate on vehicles. This process, also used by DeepSeek, reduces model size without significant performance loss.

On the hardware front, Xpeng began developing its Turing chip in 2020. The chip entered mass production in June and debuted in the Xpeng G7. Designed specifically for AI and end-to-end large model workloads, it delivers about 700 TOPS (tera operations per second) of computing power, comparable to Nvidia’s latest Thor chip, and can process models with up to 30 billion parameters.

In early August, Xpeng held an internal meeting chaired by CEO He Xiaopeng, directing all AI resources toward the foundation model team to support the rollout of the seven-billion-parameter on-vehicle model.

Li Auto has likewise ramped up its AI efforts.

During its second-quarter earnings call, CEO Li Xiang said the company’s current on-vehicle large model has more than four billion parameters, marking a tenfold increase over its previous end-to-end version. Several industry insiders told 36Kr that once Li Auto’s self-developed chip rolls out next year, its vision-language-action (VLA) model will also reach around seven billion parameters.

Initially, Li Auto deployed a smaller and slower vision-language model (VLM) on its vehicles. In October 2023, the company launched an assisted driving system combining end-to-end and VLM architectures. The end-to-end model served as the fast system, while the VLM acted as a slower copilot, each running on separate Orin X chips.

In this setup, the end-to-end system functioned as the driver’s primary “brain,” while the VLM offered secondary input, preventing the larger model from running at full capacity.

Now, Li Auto is shifting focus to the VLA framework, originally introduced by DeepMind and now widely adopted in embodied intelligence. Generally speaking, VLA architectures integrate vision, language, and action reasoning to mimic human cognition, enabling perception, understanding, and task execution. This approach is gaining traction among automakers, including Li Auto and Xpeng.

To accelerate VLA model deployment next year, Li Auto has restructured internally. In May, Xia Zhongpu, who led the company’s end-to-end assisted driving program, departed. In September, Li Auto reorganized its autonomous driving team into 11 sub-departments under a flatter structure to speed up model development.

Beyond Xpeng and Li Auto, Huawei’s WEWA architecture uses a cloud-based world engine to train its vehicle-side world model, while Nio is also deploying large world models on its vehicles.

Tesla, often regarded as the industry leader in assisted driving, has achieved regional robotaxi capability using end-to-end technology rather than large models. Suppliers such as Horizon Robotics and Momenta have also delivered strong performance through similar end-to-end approaches.

Meanwhile, some automakers that heavily promote their AI strategies have been caught up—or even surpassed—in real-world assisted driving performance. This suggests that model size alone does not determine capability. End-to-end systems are effective at learning from human driving behavior, while larger models may offer advantages in reasoning and decision-making. However, the core of assisted driving remains spatial perception, where the benefits of large model reasoning are still limited.

For automakers still refining end-to-end systems, building ever-larger models can divert computing resources toward linguistic reasoning rather than perception, potentially degrading the overall driving experience.

Many automakers are now pursuing ambitions beyond vehicles. Some see themselves as players in the broader field of embodied intelligence, with Li Auto among them.

During a company live stream last December, CEO Li described Li Auto as an AI company that still makes cars but views vehicles as “spatial robots,” applying its broader AI vision to mobility.

Xpeng has similar ambitions. The company plans to use its Turing chip not only in vehicles but also in robotics and flying cars, extending its autonomous driving capabilities to other intelligent systems.

Marketing is another factor. Following the global attention generated by ChatGPT, “large models” has become a powerful industry buzzword, just as “end-to-end” gained traction after Tesla’s V12 rollout in North America. For some automakers, highlighting large model development serves as both a technological statement and a way to capture public attention.

Still, regardless of motivation, improving real-world assisted driving performance should remain the industry’s core priority. Narratives that stray from this focus risk losing sight of what ultimately matters to drivers and passengers alike.

KrASIA Connection features translated and adapted content that was originally published by 36Kr. This article was written by Fan Shuqi for 36Kr.