Huawei Cloud is taking a different route in artificial intelligence as China’s token economy takes shape.
Tokens are the units of text processed by AI models, and they have become a key measure of usage, cost, and scale in the AI cloud business.
“At a time when domestic computing power is growing, Huawei Cloud does not care that much about total token volume, nor about total revenue. What we care about is the health of tokens produced by domestic computing systems, and whether they represent productivity gains rather than just emotional value,” Huawei Cloud CEO Peter Zhou said on June 5 at the 2026 Huawei Cloud Inspire conference in Shanghai.
Zhou gave a simple example. A person casually asking AI a question on a phone also generates tokens, but the value of those tokens is difficult to assess. In his view, a cloud platform should not be evaluated by how many trillions of tokens it runs, but by how much efficiency those tokens create for enterprises.

Over the past two years, Chinese cloud vendors have fought a prolonged token price war. In May 2024, after DeepSeek V2 triggered the first major round of price cuts, ByteDance’s Volcano Engine doubled down by setting Doubao’s price at RMB 0.0008 per 1,000 tokens. Alibaba, Baidu, Tencent, and iFlytek soon followed.
The premise was to use low-priced models to attract traffic and drive public cloud sales, even if gross margins for inference computing were pushed into negative territory. After DeepSeek R1 shifted mainstream attention toward reasoning models, coding and video models brought price competition back into focus.
At the conference, Huawei Cloud did not respond with another price cut. Instead, it proposed a new paradigm called “Agentic Infra,” aimed at supporting domestic computing power for AI agents. While others compete on cheaper tokens and higher call volumes, Huawei Cloud is choosing another route: controlled domestic computing infrastructure and enterprise productivity.
Built on domestic hardware
Huawei Cloud presented a full infrastructure stack for Agentic Infra. It defined the system across four layers: efficient “token factories,” continuous learning, integrated scheduling for general-purpose and intelligent computing, and secure autonomy. The company announced four corresponding products.
The primary product is what Huawei Cloud calls an AI cluster service, built to support clusters with up to 100,000 accelerator cards and total computing power of 200 exaflops. According to the company, it can reduce token generation latency to less than ten milliseconds, achieve throughput of 5 million tokens per second per 1,000 cards, and offer online service availability of 99.95%. Huawei Cloud calls this its “token factory.”
Its CCE Volcano scheduling engine pools training and inference resources while integrating fragmented capacity, improving resource utilization by more than 30%, according to the company. Its agentic memory storage solution uses direct access to NPUs, or neural processing units, to create petabyte-level memory space, while AgentSphere provides a secure runtime environment for agents with 100-millisecond startup.
At the model layer, Huawei Cloud launched ModelArts, a new training and inference platform. Its model routing function automatically dispatches requests to the most suitable model based on request characteristics. Huawei Cloud said the platform currently connects to more than 15 advanced models, with routing accuracy above 95% and average call costs reduced by 20%.
Huawei Cloud also released enterprise features for sectors where it has an established presence. ModelArts packages reinforcement learning as an enterprise service and offers confidential inference for sensitive scenarios such as finance and coding, keeping data inside protected environments.
The foundation for this route is the Ascend ecosystem. Earlier this year, when DeepSeek was released to the public, Huawei Cloud and SiliconFlow deployed DeepSeek R1 and V3 on the Ascend CloudMatrix 384 supernode. At the time, its inference efficiency could match Nvidia H800, suggesting that domestic computing power can offer usable performance for mainstream AI inference.
From computing base to industry deployment
If infrastructure answers where computing comes from, Huawei Cloud now wants to answer where tokens go: vertical productivity.
Huawei Cloud opened public testing for AgentArts, an enterprise agent platform, and launched the open-source version, OpenJiuwen, whose core is more than 90% shared with the enterprise version.
Zhou emphasized Huawei Cloud’s ecosystem advantages in the AI era, citing the openness of Ascend and Kunpeng computing, OpenEuler, and the open-source ModelArts toolchain. He described Huawei Cloud as “the most open cloud in the agent era.”
That openness extends to the model ecosystem. At the conference, Huawei Cloud joined more than 20 model developers, including Zhipu AI (also known as Z.ai), DeepSeek, Moonshot AI, StepFun, and Baidu, to launch a broad collaboration plan.
Industry scenarios are the productivity endpoints Zhou sees as most important. Huawei Cloud said it has already helped enterprises apply AI across several sectors. In embodied intelligence, for example, it launched the CloudRobo development platform, allowing small and midsize companies to access computing power and share data and models at lower cost.
“China has more than 300 embodied intelligence startups, and most are not large. The pressure of each building its own computing and data chain is too high,” Zhou said.
Healthcare is another focus. Huawei formed a healthcare corps in March 2025 and made AI-assisted diagnosis a corps-level strategic direction. Huawei Cloud now has a representative case: a pathology large model jointly developed with Ruijin Hospital. China has only about 20,000 doctors who can read pathology slides, while remote hospitals face higher misdiagnosis risks. The model allows county- and city-level hospitals to call advanced diagnostic capabilities through the cloud, reducing the need for patients to travel long distances.
Huawei Cloud also released a hybrid cloud white paper and confidential computing solution for agents, targeting the data security and localization concerns of governments, financial institutions, and state-owned enterprises. The approach combines public and private cloud routes.
By Zhou’s description, Huawei aims to build a second computing plane. That means it is not trying to match the scale of Nvidia-centered global infrastructure, but to give developers another technical route and ecosystem option.
KrASIA features translated and adapted content that was originally published by 36Kr. This article was written by Deng Yongyi for 36Kr.
Note: RMB figures are converted to USD at rates of RMB 6.79 = USD 1 based on estimates as of June 10, 2026, unless otherwise stated. USD conversions are presented for ease of reference and may not fully match prevailing exchange rates.
