How do industry professionals in China view DeepSeek V4’s strengths and limits?

DeepSeek V4’s technical report has been among the most closely watched documents in the artificial intelligence sector since its release last month.

Is V4 powerful? From an engineering optimization perspective, the answer appears to be yes. For years, much of the AI industry operated on the assumption that model performance could be improved by scaling up high-quality computing resources and expanding parameter counts. V4 takes a different approach. It emphasizes restraint in model training.

Rather than relying on aggressive increases in computing resources and parameters, DeepSeek V4 uses a combination of architectural and systems-level optimizations. These include attention mechanisms, which help the model focus on the most relevant parts of a prompt, much as a reader might focus on key sentences in a long article. It also uses a mixture-of-experts (MoE) architecture, which can be understood as allowing different “experts” to handle different types of problems while activating only a small number of them at a time to save computing resources.

Other optimizations include post-training, in which a model receives targeted reinforcement after its initial training, and inference systems engineering, which improves efficiency across the stages of real-world operation.

According to DeepSeek’s technical report, the computing resources required by V4-Pro to process a long context of 1 million tokens have been reduced to 27% of the level required by the previous-generation V3.2. Its KV (key value) cache, the temporary storage used to hold conversational context, has been compressed to 10% of its previous size.

But engineering is still engineering, and benchmarks are still benchmarks.

The model’s value should be evaluated in relation to real deployment, development, and investment scenarios. To that end, 36Kr spoke with developers, founders, and investors who tested and used the model over roughly three days.

The following transcripts have been edited and consolidated for brevity and clarity.

How does DeepSeek V4 improve coding and agentic workflows?

Huang Dongxu, co-founder and CTO of PingCAP

I am migrating my workflow from Hermes to DeepSeek V4. I used to be fairly wasteful, using Claude Opus and GPT-5.4 as agents. Later, I realized that most everyday work does not require especially strong coding ability.

Daily office tasks mainly include organizing routine emails, writing articles, managing calendars, summarizing content, and browsing the web.

I have now fully switched to DeepSeek V4. Its performance is better than I expected. It may have been optimized for Mandarin Chinese, and its overall language ability is more aligned with the habits of native Chinese speakers than Opus and GPT.

My first conclusion is this: If you are currently using more expensive models as agents for daily work assistance, you can switch to DeepSeek V4 Pro with a fair degree of confidence.

Its capability is probably around the level of Claude Sonnet 4.5 or 4.6, but its price is less than one-quarter that of the leading models. Basically, I no longer need to pay attention to the cost of running agents.

DeepSeek V4’s paper repeatedly emphasizes its 1 million-token context length, but I do not think this is especially strong, because most mainstream state-of-the-art models now have at least a 1 million-token context window. This is just catching up.

Its real strengths are cost and openness. The cost is genuinely low, and the model is open source. I do not need to worry as much about Anthropic or OpenAI cutting off supply and making some of my previous workflows unusable. This has happened before. In this respect, switching to DeepSeek V4 gives me a greater sense of security.

Next, look at programming ability. Because the testing period was relatively short, I have not yet used it to develop very complex, large-scale system applications.

But at the scale of several thousand lines of code, or when building small applications, as well as in scenarios involving calls to external third-party systems, such as going to Supabase or TiDB Cloud and reading documentation to connect to an unfamiliar tool, my experience so far is that there have basically been no major problems.

At the scale of several thousand to 10,000 lines of code, V4’s one-shot success rate, meaning the ability to complete a task from sufficient examples and instructions in one go without additional debugging, remains relatively high.

So if you are building simple websites or small applications, I think DeepSeek’s programming ability is much stronger than the previous generation.

That is because my harness framework is not built around complex human orchestration. It relies more on the model’s own collaborative ability, using Slock.ai.

Put simply, it can collaborate with agents that use other models, and it can complete simple, specific tasks.

So if stronger models, such as those at the GPT-5.5 level, provide direction and DeepSeek V4 Pro handles execution, I think this model can significantly reduce the cost of overall harness engineering.

Zhao Binqiang, vice president of technology and product center at 01.AI

DeepSeek V4 is not the most versatile model, but it is the most trustworthy. Its firm commitment to open source, complete technical report, extremely low inference cost, and full-stack domestic Chinese technology make it the foundation model with the best cost-performance ratio for enterprise scenarios.

Two things about DeepSeek V4 impressed me most.

The first is fundamental innovation in its model architecture. It still maintains high-quality reasoning ability with a context window of up to one million tokens, supported by innovation in hybrid attention mechanisms. This mechanism can be understood, in simple terms, as combining skimming to understand the big picture with close reading to grasp details precisely. Its exploration of context compression is especially advanced, and DeepSeek disclosed the details in full in its technical report. That kind of candor and open-source spirit is valuable.

The second is full-stack adaptation to domestic Chinese computing infrastructure. DeepSeek has completed adaptation for Huawei Ascend 910B and 950, and it has done detailed work in quantization, sparsification mechanisms, and domain expert optimization. This means that a full-stack domestic Chinese solution, from chips to underlying software to model training and inference, has taken a substantive step forward. We cannot say it has fully broken free from dependence on the Nvidia ecosystem, but it has found the right development path. The difficulty and significance of this should not be understated.

Li Bojie, chief scientist of Pine AI

What surprised me most was that DeepSeek made a long list of architectural innovations, including MoE, CSA plus HCA hybrid attention, mHC, Muon, and FP4QAT, run at 1.6 trillion parameters, which it said is the largest current open-source scale.

This is like taking a set of technologies that are theoretically advanced but often fail in small experiments, combining them into a giant engine, and making that engine run stably. We have tried more than 20 architectural innovations ourselves, and the conclusion was almost always that they work at the seven billion-parameter scale, but may break down or even backfire once scaled up.

Architectural innovations from other model developers often get stuck at this step as well. Making multiple innovations work together at the largest scale shows that DeepSeek has extremely deep technical accumulation in underlying training. Just one of these technologies, mHC, reduced a signal amplification effect that had reached nearly 3,000 times in the original 27 billion-parameter experiments to about 1.6 times, making training stable and controllable.

Song Chunyu, vice president of Lenovo Group, chief investment officer, and senior partner at Lenovo Capital

DeepSeek has shown that cost performance can become a structural advantage by design.

Its computing requirement is 27%, and its memory footprint is only 10%. At the same time, it has a large total parameter count of 1.6 trillion, but activates only 49 billion parameters each time, making it highly efficient.

This structural cost reduction, combined with V4-Flash’s API pricing of RMB 1 (USD 0.15) per million tokens, has made democratized ultra long context a new benchmark for AI applications.

Chen Weipeng, founder and CEO of Loopit

What excites me most about DeepSeek V4 is not merely an improvement in a single capability. It is that Chinese large models have moved from catching up in foundation capabilities to participating in system-level competition in the agent era.

In the past, people cared more about whether models could answer questions, reason, and write code. Today, what matters is whether a model can reliably complete goals in complex tasks, and whether it can be integrated into real product systems at sufficiently low cost and with sufficient efficiency.

Where does DeepSeek V4 fall short?

Li Bojie

I mainly use it for coding and agentic tasks. In this category, V4-Pro’s tool-calling ability and general world knowledge have basically caught up with the tier just below frontier models, roughly equivalent to Claude 4.6 Sonnet.

But tool-calling stability and hallucination rates remain serious weaknesses. These two issues must be addressed at the agent harness level, such as by strengthening validation, automatically retrying after failures, using external knowledge bases to keep the model grounded, and defining tool use specifications strictly and clearly. Otherwise, in long-chain tasks, errors will continue to compound as the chain becomes longer.

Once the harness layer makes up for these two shortcomings, overall inference cost can be several times lower than that of frontier models. That is the real leverage.

Another line of thinking is that V4-Flash is well positioned as a foundation model for vertical fine-tuning. Vertical fine-tuning means taking a general model and giving it additional training with professional data from a specific domain, making it more specialized for that industry.

Post-training a 1.6 trillion-parameter large model, whether through supervised fine-tuning or reinforcement learning, is too expensive for most companies. Models with 200 billion to 300 billion parameters are the main size range for post-training in the market. We previously did post-training on Qwen 235B, and its results were clearly weaker than V4-Flash at the same size.

Flash’s performance has already caught up with the previous generation of trillion-scale open-source models, surpassing DeepSeek V3.2 and the older version of Kimi. Flash will likely become a preferred foundation model for business fine-tuning.

Chen Weipeng

Judging from actual use of Loopit’s AI interactive content product, mainly in coding scenarios, we need to be objective: DeepSeek V4 still lags behind the strongest overseas closed-source models in the stability and task completion rate of complex, long-horizon tasks.

Capability gaps among China’s leading AI models are narrowing. This shows that model competition is entering a new stage. In the agent era, whether a model can understand long context, adapt to complex frameworks, reliably complete long-horizon tasks, and run at acceptable cost and speed will become equally important.

What truly creates differentiation is not just the model itself, but the overall system formed by the model, post-training, agent framework, evaluation system, and engineering efficiency.

Song Chunyu

The release of V4 did not include a native multimodal version, meaning a model that can process text, images, audio, and other inputs at the same time. This is somewhat regrettable in the current market environment.

But given its strategy of fully embracing domestic Chinese computing infrastructure, this was likely a phase-specific tradeoff made to concentrate resources on solving the most fundamental computing infrastructure problem.

Zhao Binqiang

Calling it below expectations would be nitpicking.

But from a consumer-facing perspective, the product polish is still insufficient. The Flash version is somewhat lacking in complex tasks involving creation and programming. The Pro version is close to top closed-source models in capability, but its baseline computing requirements are relatively high, creating an entry barrier.

Is AI just getting cheaper?

Chen Weipeng

One important trend is that AI is not simply getting cheaper.

The cost of calling the world’s flagship models is actually rising because they handle more complex tasks, longer contexts, and higher-value workloads. What is becoming cheaper quickly is the middle layer of models, including open-source and self-deployable models.

So future application companies will not simply ask which model is the strongest. Instead, they need to build model orchestration systems that determine which tasks must use the strongest model, which tasks can use a high cost-performance model, and which capabilities can be supplemented through agent frameworks and engineering systems.

The significance of DeepSeek V4 is that it further enriches the model supply layer.

For companies, it is not simply a replacement for a particular overseas model. Instead, it allows applications to pursue multimodel orchestration, self-deployment, and cost optimization more flexibly.

In the future, the moat for AI applications will not come from simply calling one model. It will come from organizing models, agents, product scenarios, and data feedback into a reliable, low-cost, and scalable production system.

For Loopit, this trend is crucial. We build AI interactive content. Model capability determines the ceiling of creation, while cost and speed determine whether creation can scale.

Only when models at different levels are sufficiently usable and can be effectively orchestrated can large volumes of creativity from ordinary users be generated, interacted with, and distributed in real time. DeepSeek V4’s progress will accelerate this process.

Li Bojie

In the vertical fine-tuning market, foundation models with 200–300 billion parameters, such as Qwen and Llama, will be systematically challenged by DeepSeek V4-Flash.

All teams doing post-training at this size will reevaluate their choices. Flash outperforms at the same size, and inference frameworks had full day-zero adaptation, including SGLang, vLLM, and TileLang. Within six months, it may become the default starting point for domestic Chinese open-source vertical models.

An ecosystem is building around inference using Huawei Ascend 950, and it will challenge premium-priced chip options from Nvidia.

This is the first fully working domestic Chinese chip and top domestic Chinese open-source model solution. Neither Nvidia nor AMD received early adaptation for V4. After large-scale shipments of the 950 in the second half of the year, a wave of fully domestic inference replacement will emerge in agent long-context scenarios.

The indirect impact is that Nvidia’s valuation and premium in the China market will be repriced. This does not mean its sales will collapse, but its pricing power will be compressed.

The overall cost of using agents capable of completing complex long-horizon tasks will drop significantly.

V4-Pro pricing, at USD 1.74 per million input tokens on cache misses and USD 3.48 per million output tokens, combined with efficient KV at one million tokens in context length and MegaMoE, has reduced per-token cost to one-sixth or one-seventh of frontier models.

As long as the industry addresses DeepSeek V4’s tool calling stability and hallucination rate at the agent harness level, through validators, external grounding, strict schemas, and self-consistency voting, applications such as multistep research, long-horizon coding agents, and deep search, which previously could not be put into practical use because of cost, will move from demos into real businesses in the second half of this year. The inflection point for agent economics is arriving in this wave.

Closed-source frontier providers will not necessarily cut prices because of this. Their products remain significantly ahead, and V4 does not yet create direct pricing pressure on them.

Zhao Binqiang

The key challenge for enterprise AI applications is achieving full-cycle cost control while ensuring performance. The emergence of DeepSeek V4 provides a competitive solution to this challenge.

Flash covers simple tasks, while Pro covers high-complexity scenarios. Overall costs will be significantly lower than mainstream closed-source solutions, allowing 01.AI to improve the cost-performance ratio of its solutions during delivery.

More importantly, DeepSeek’s open-source stance appears firm. It is unlikely to suddenly announce a closed-source shift and cause application-level investments to go to waste. This open-source posture provides valuable certainty for enterprise technology selection.

01.AI has already fully launched product evaluations and capability validation based on DeepSeek V4. The focus is on evaluating its performance in core enterprise scenarios such as production scheduling, intelligent office, and investment management. Once validation meets standards, it will consider replacing existing models so more industry clients can use top-tier domestic Chinese large models.

After V4’s release, I believe we will see three changes:

Domestic tech substitution will become more realistic. DeepSeek’s successful adaptation to Huawei Ascend means China’s AI sector has taken a substantive step toward full-stack domestic substitution across chips, frameworks, models, and applications. For government and enterprise clients with compliance requirements, this is a hard demand. The domestic substitution process in the enterprise market will clearly accelerate.
Open-source large models will put pressure on closed-source models to cut prices, reducing the extent to which closed-source models extract value from AI application businesses. DeepSeek has achieved results close to top closed-source models at prices far lower than those of top closed-source models. Its demonstration effect will further raise the overall performance expectations for open-source models. This will also force high-price strategies from closed-source model providers such as Anthropic and OpenAI to face pressure. The profit center will shift from foundation models toward deep industry applications, which would benefit the long-term development of AI.
Open-source models do not equal enterprise applications, and harness capability will become the new dividing line. Open source lowers the foundation-model threshold. Harness determines the height of deployment. Between a high-quality open-source model and a stable, reliable enterprise product sits the harness layer, which includes engineering capabilities such as hallucination reduction, instruction following, error validation, and domain expertise injection.

With different needs, no single harness is universal. This is 01.AI’s key strength: based on automated evaluation, automated feedback, automated improvement, and domain expertise injection, it can quickly build dedicated harness systems for different use cases, allowing large models to be used in business.

Song Chunyu

First, million-token context will become a standard feature at the application layer, driving an agent boom. V4 turns long-context capability into accessible infrastructure.

Second, competition will shift from models to applications and data. When the performance of leading open-source models approaches that of closed-source models and costs fall sharply, the model itself will no longer be a moat. Future investment and competition will focus more clearly on who can use these foundation models to build data and application feedback loops in high-value vertical scenarios such as healthcare, finance, and law, thereby forming commercial moats.

Third, domestic Chinese computing infrastructure will see major investment opportunities. V4’s success has shown the industry that large models can also reach leading performance on domestic Chinese computing infrastructure. This is expected to create clear demand for domestic Chinese computing resources, driving investment across the industry chain, from chip design and servers to cloud services.

Our judgment is that this year’s domestic Chinese computing infrastructure is comparable to last year’s overseas computing infrastructure. Its industry trend and reflection in capital markets will be especially strong.

We will concentrate resources on projects that can commercialize quickly, enter industries, and form product moats, while maintaining long-term investment in underlying architecture and computing infrastructure.

KrASIA features translated and adapted content that was originally published by 36Kr. This article was written by Zhou Xinyu and Wang Yuchan for 36Kr.