China’s model-as-a-service market is expanding rapidly, growing from a small, narrow segment into a promising source of business growth. The latest data from market research firm IDC shows that, in 2025, call volume in China’s enterprise MaaS market rose 16-fold year-on-year to 1,944 trillion tokens. IDC expects growth to accelerate further in 2026.
In 2025, especially in the second half of the year, almost all of China’s cloud computing vendors and large model companies entered the market. They put more computing power, sales resources, and product resources behind MaaS, making it a higher priority and intensifying competition.
Conventional wisdom suggests that, in a fast-expanding emerging market, the arrival of latecomers makes it easier for the leader’s share to be diluted. That seemed especially true in a market like MaaS, which was once believed to be a space where large model APIs would have limited stickiness. In theory, developers only needed to change a few lines of code to replace the underlying model or switch cloud platforms.
But IDC’s latest data points to a counterintuitive result: In 2025, Volcano Engine’s share of China’s MaaS market remained highly stable, rising from 49.2% in the first half of the year to 49.5% for the full year.
In other words, during the most competitive stretch in the second half, Volcano Engine was not diluted by latecomers. Instead, as the market expanded, it widened its lead.
Outside observers tend to attribute this to aggressive pricing. In May 2024, when Volcano Engine launched the MaaS service for its Doubao large model, it cut prices to 99.3% below the prevailing industry level. But subsidies alone cannot explain Volcano Engine’s continued share expansion. Other players quickly lowered the prices of their MaaS services to similar levels. What truly determines whether low prices can be sustained is call volume and inference engineering capability.
Model capability is just as critical. The rapid expansion of the MaaS market has largely come from new use cases opening up as models improve. Stronger coding capabilities have driven the popularity of vibe coding and agents, while video-generation models have entered production workflows for microdramas, animated comics, and advertising, continuously increasing token consumption.
This means MaaS may be more of a speed race in a growing market. Under this logic, companies that can turn model capabilities into products faster and provide stable, cost-effective services will be better positioned to absorb new use cases and expand their share as the market grows.
From the Doubao large language model to the Seedance video generation model, the Doubao model family has continued to iterate. On that basis, Volcano Engine has accelerated the conversion of its accumulated token scale into a broader competitive edge, including lower inference costs, higher engineering efficiency, and the infrastructure needed to run agents.
Behind Volcano Engine’s low prices are scale and engineering capability
Cloud computing typically has high fixed costs and low marginal costs. Servers, networks, R&D, and operations systems all require heavy upfront investment, but the marginal cost of each additional call declines. The larger the scale, the easier it becomes to spread R&D and infrastructure investment across usage.
Scale also amplifies the value of engineering optimization. Volcano Engine president Tan Dai once gave an example:
“Optimizing utilization by one percentage point across 10,000 servers and doing so across one million servers creates a 100-fold difference in returns. You can build a strong team to do it better.”
Scale was the variable Volcano Engine cared about most when it made MaaS a priority. The goal was not simply to sell model interfaces, but to increase token call volume as quickly as possible.
To that end, Volcano Engine made token consumption a core metric for business development and adjusted the performance evaluation system for its sales team. For MaaS products with the same sales value, the internal incentive weight was several times that of traditional cloud services.
Alongside the higher business priority, Volcano Engine also increased its technical investment in model inference. MaaS costs are mainly determined by the efficiency of token generation. If server utilization, cache hit rates, and computing resource scheduling efficiency improve, costs have room to fall.
“Lower costs can give rise to more applications and expand the overall market,” Tan later said when discussing the pricing strategy at the time. Once the company saw that “technology could bring costs down, it decided to cut prices all the way.”
The key technologies supporting Volcano Engine’s price cuts at the time were mainly prefill-decode (PD) disaggregation and key value (KV) cache, both of which it applied at scale relatively early:
- PD disaggregation splits “understanding the question” during inference, known as prefill, from “generating the answer,” known as decode, and matches each process with more suitable computing units.
- KV cache stores historical states during model generation, avoiding repeated computation of prior context every time new content is produced. This saves GPU memory bandwidth and reduces inference costs.
But these technologies depend on scale. At small call volumes, maintaining complex cache and scheduling systems carries costs of its own, which may offset the computing power saved.
As technologies such as PD disaggregation and KV cache spread, token prices have gradually converged. For followers that lack economies of scale, matching low prices often means greater cost pressure and may even lead to losses.
Volcano Engine, with its larger call volume, faces less cost pressure and has more room to keep optimizing inference technology, forming a more sustainable low-price capability.
Volcano Engine is also looking for room to reduce costs beyond technology and engineering. On one hand, it uses differentiated pricing based on context-length ranges, giving customers more choice. On the other, it has launched a savings plan that combines customer usage across different models, including language models and video-generation models. The scale discount customers accumulate on language models can be used to offset the trial-and-error cost of new businesses such as video generation.
IDC’s latest China MaaS report noted that Volcano Engine had the highest market share by call volume. Its revenue share also ranked first, but was a few percentage points lower than its call volume share. Volcano Engine’s per-token price was below the industry average.
Notably, IDC’s statistics on China’s MaaS market mainly cover enterprise calls to models on public clouds. They do not include AI applications developed by ByteDance, such as Doubao and Dreamina (also known as Jimeng AI). They also exclude tokens generated when internal businesses such as Douyin and Lark (known domestically as Feishu), deploy large models.
These call volumes are not included in IDC’s market share statistics, but they still affect Volcano Engine’s cost structure and engineering efficiency.
Agentic AI turns MaaS into an infrastructure business
In an April interview on tech podcast Stratechery, OpenAI CEO Sam Altman said AI’s next stage will likely shift from the typical process of a user providing text and a large model returning text or code to AI agents running inside companies and completing many types of work. He said OpenAI is also working with Amazon Web Services (AWS) to develop a product similar to “virtual coworkers.”
MaaS is evolving from the standardized supply of model interfaces into enterprise infrastructure, with stronger stickiness. For an enterprise agent to run in production, it needs identity authentication, access control, a memory system, tool calling, a sandbox environment, logging, security governance, and other components. It also needs connections to a company’s internal systems.
This is also why more companies are paying attention to agent harnesses. “Harness” originally refers to equipment used to control a horse. In the context of agents, it refers to the engineering system that works with the foundation model. MaaS supplies stable model capabilities, while the harness turns inference into constrained, traceable, continuously running workflows.
The way cloud platforms provide large model services is changing along with this shift. Whether through Anthropic’s partnerships with multiple cloud vendors or OpenAI’s partnership with AWS in April, the arrangement is no longer just about placing model interfaces on a cloud platform. It also involves packaging them into the cloud platform’s native agent environment, allowing enterprises to develop and operate production-grade agents within the cloud environment.
Volcano Engine’s product evolution over the past few years can also be understood in this context. While improving MaaS competitiveness, it has expanded large-model services into infrastructure that covers agent development and operations.
“We were the first in China to launch a full suite of agent products and simplify agent development,” Tan said in an interview late last year. Customers can build a complex agent with just a few lines of code, “just like when you used to develop a complex website,” except that now, new AI middleware is required.
In his view, writing code in the past was essentially about writing if-else logic to define workflows. Now, when developers build agents based on models, they are more often writing prompts, while process planning, task decomposition, the creation of sub-agents, and other steps are increasingly handed over to the model itself. This is also the underlying logic of products such as OpenClaw.
That is why, early this year, while supporting the Lunar New Year gala televised by CCTV, Volcano Engine was able to launch ArkClaw in a timely manner, a product based on OpenClaw, enhance its security capabilities, and open-source OpenViking, a context database designed for agents’ long-term memory, making ArkClaw more useful.
The company defines the personal edition of ArkClaw as an “agile-state agent.” It first allows employees to quickly experiment with ideas that can improve business efficiency, then deposits and solidifies validated capabilities so they can be reapplied consistently. The latter corresponds to HiAgent, the agent development and operations platform Volcano Engine launched in 2024.
By April, the number of companies that had consumed trillions of tokens on Volcano Engine had reportedly grown from 100 at the end of last year to 140, indicating stronger uptake among major MaaS customers.
The AI cloud flywheel has started to spin
In business analysis, the flywheel effect is often used to explain the success of AWS, the world’s largest cloud computing platform. Scale spreads out costs, price cuts attract more customers, and customer growth brings more feedback, cash flow, and ecosystem strength, driving continued iteration in technology and services.
Volcano Engine is building a similar flywheel in the AI era. But its flywheel does not fully follow the logic of traditional cloud computing. The traditional cloud computing flywheel mainly revolves around computing power, storage, networks, and software ecosystems. The MaaS flywheel adds model capabilities, token usage patterns, agent scenarios, and real-world business feedback.
The first layer of Volcano Engine’s flywheel is the loop among model capabilities, call volume, and inference costs.
ByteDance’s internal model R&D team, Seed, steadily supplies Volcano Engine with AI model capabilities. The stronger the model, the easier it is to expand call volume. The larger the call volume, the more engineering techniques can be used to lower costs. Once costs fall, Volcano Engine can attract more customers. This is similar to the scale flywheel of traditional cloud computing, except the unit of measurement has shifted from servers, storage, and bandwidth to tokens.
The second layer of the flywheel comes from feedback from real-world use cases within the ByteDance ecosystem. Doubao, Dreamina, and dozens of business lines, including Douyin and Lark, join external customers in developing and using Volcano Engine’s large model capabilities. In turn, they provide Volcano Engine with high-frequency, complex, real-world product feedback.
One end of this feedback loop flows to the Seed model team, helping its foundation models continue to iterate. The other flows to Volcano Engine’s agent team, helping it improve product capabilities.
Agentic products are especially dependent on this feedback. Anthropic has published articles suggesting that improving agent capabilities is not simply a matter of improving model capabilities. Internal employees, external users, production monitoring, A/B testing, user research, and customer requirements have all helped drive the iteration of products such as Claude Code.
Volcano Engine’s sizable share of China’s MaaS market in 2025 could therefore be an interim result, reflecting a flywheel that had only begun to spin.
Now, the agent boom is continuing to push up market demand, and the industry has at times faced shortages of computing power. Some companies have chosen to raise prices to improve short-term financial performance, but Volcano Engine has said it will not follow.
This pricing restraint reflects Volcano Engine’s assessment of the industry’s stage of development. Compared with securing higher short-term profits, what matters more now is expanding call volume, lowering barriers to use, and increasing real-world use cases so the flywheel can keep accelerating.
KrASIA features translated and adapted content that was originally published by 36Kr. This article was written by Xiao Xi for 36Kr.

