Beyond the cloud, Houmo.AI sees AI compute shifting to the edge

Walk through Hall H1 of this year’s World Artificial Intelligence Conference (WAIC), and two themes dominate China’s computing landscape: “supernodes” and edge computing chips for generative artificial intelligence.

These two paths signal a clear divergence in how computing is evolving in the era of large models.

On one side is the continued rise of cloud-based training. Supernodes are now central to China’s push for cloud AI infrastructure. At WAIC 2025, companies including Huawei and several leading domestic chipmakers showcased supernode computing displays that reflected this ambition.

On the other side is a surge in edge AI computing. As generative AI gets embedded into real-world applications, demand is shifting toward compact, inference-ready edge chips. This trend accelerated after DeepSeek’s breakthrough, which sharply lowered the compute barrier for running large language models (LLMs). At WAIC, companies displayed sleek edge AI chips powering consumer hardware and industrial devices alike.

Together, these trends suggest that generative AI’s next chapter will hinge on hybrid computing: training in the cloud, inference at the edge. Wu Qiang, CEO of Houmo.AI, expects 90% of generative AI inference will eventually run on local devices, with only 10% relying on cloud compute. That level of ubiquity is needed if AI is to reach every household, in every setting, he said.

Nvidia has already reaped enormous gains from the cloud side of this boom. Its market cap grew sixfold in just two years, surpassing USD 4 trillion. But edge AI represents a different frontier, one that is just taking shape and may ultimately be even larger in scale, with space for new players and approaches.

Houmo.AI is one of the key players betting on this future. Its founder, Wu Qiang, has deep roots in chip design, with stints at Intel, AMD, Facebook, and Horizon Robotics. In 2021, he launched Houmo with a singular goal: build next-gen AI chips using compute-in-memory (CIM) to solve the “last mile” of AI efficiency.

But why base the company’s core around CIM? What can this technology unlock?

The answer starts with the limits of conventional architecture. As model sizes grow into the tens or hundreds of billions of parameters, the classic von Neumann architecture is hitting real bottlenecks. Memory and power have become critical constraints. In many cases, the energy spent moving data between memory and compute units now exceeds the energy used for computation itself.

CIM addresses this head-on. By performing matrix multiply-and-accumulate operations directly within memory cells, it eliminates the need for constant data shuttling, removing one of AI’s most stubborn inefficiencies.

Ahead of WAIC 2025, Houmo unveiled the Momagic 50, its latest CIM-based chip. Wu described it as a major architectural leap, enabled by a new generation of in-house IP that dramatically improves both energy and area efficiency.

At its core, the Momagic 50 integrates Houmo’s proprietary CIM architecture IPU, allowing floating-point models to run natively within a CIM system, marking a key step toward unlocking more efficient inference at the edge. To smooth adoption, the chip is paired with a next-generation compiler toolchain, Houmo Dadao, designed for seamless compatibility with mainstream deep learning frameworks. Clients can migrate and adapt existing models without friction.

Performance numbers back it up: 160 TOPS (tera operations per second) at INT8, 100 TFLOPs at BFP16, up to 48 gigabytes of onboard memory, and 153.6 GB per second of memory bandwidth, all while drawing just ten watts of power, about the same as a smartphone’s fast charger. This means that devices like tablets, PCs, and robots can run LLMs with 7–70 billion parameters entirely offline, without cloud support.

Wu emphasized that edge AI computing is defined by decentralization and extremes—tight power budgets, compact form factors, and increasingly large models. To meet that reality, the Momagic 50 supports both x86 and Arm CPU architectures, enabling broad deployment flexibility.

With the product line established, Houmo is moving quickly to commercialize. Wu shared that several marquee clients are already in the pipeline, including Lenovo’s AI-powered PC line, iFlyrec’s smart voice systems, and China Mobile’s hybrid network deployments.

In a conversation with 36Kr, Wu spoke candidly about Houmo’s founding, the internal struggle behind its pivot, and the challenge of evolving from scientist to founder in the AI era.

The following transcript has been edited and consolidated for brevity and clarity.

36Kr: Houmo’s first-generation product focused on the smart driving market, but today you’re leaning heavily into general-purpose edge AI. What was the thinking behind that shift?

Wu Qiang (WQ): From day one, we were committed to building high-efficiency AI chips using CIM. That part never changed. But identifying the right application scenarios—that was an ongoing process, and it took some course corrections.

When we launched in early 2021, I drew from my past experience. Smart driving seemed like a promising direction. Tesla had already set the standard for “software-defined vehicles,” and China’s domestic market appeared open to new entrants.

But by the second half of 2023, we realized that path wouldn’t work for us. The field was crowded, and large incumbents had already locked in major ground. There was also a major issue with our first-gen product.

To highlight the energy and area efficiency of CIM, we built a chip with very high compute capacity: 256 TOPS of physical power, and up to 512 TOPS with sparsity. But high compute also meant high cost, and that no longer matched market demand. In 2023, the trend in smart driving was all about driving prices down. Vendors were marketing full-stack driving assistance systems for as little as RMB 1,000 (USD 140). And many in the space were saying Level 3 driving autonomy would never happen, stopping at Level 2 plus, plus, plus. Nobody needed heavy-duty compute anymore.

So our chip was overpowered, and excessively so. And on top of that, we were a new entrant asking others to adapt their systems to ours. That made adoption even harder. We thought about toning down the specs for our second-gen product to boost the performance-to-cost ratio, but it became increasingly clear that the window was closing. By the time we had it ready, the market would’ve moved on.

We knew we had to pivot. But pivoting was excruciating. Our team had already put a ton of effort into the next-gen chip for smart driving. Walking away from that was heartbreaking. I was deeply conflicted. I even worried that pivoting would be seen as a weakness, that we’d be viewed as quitters. But in the end, survival trumped pride. We made the call and began to change direction.

Then came the next question: pivot to what?

Starting in 2023, I began tracking LLMs closely. We did deep technical and market research. We realized that LLMs are fundamentally bandwidth- and compute-heavy workloads, and that makes them perfectly aligned with what CIM solves. We also noticed a trend: LLM computing was moving from the cloud to the edge. That presented an opportunity that aligned with Houmo’s strengths.

So at the start of 2024, we quickly revised our first-gen chip into the Momagic 30, streamlining it and tuning it for LLM inference.

We gave our first demo at the China Mobile booth at MWC Barcelona. Using the Momagic 30, we ran a six-billion-parameter model from Zhipu AI, and it worked great. That gave us tremendous confidence. Our shareholder China Mobile also encouraged us to explore more edge use cases for general-purpose large models. With those factors in play, we committed to this new direction. The team worked incredibly hard, and after more than a year, we launched the M50.

36Kr: You’ve mentioned marquee clients like Lenovo, iFlyrec, and China Mobile. Where else are you planning to expand?

WQ: We’re building general-purpose edge AI chips. Our current priorities are:

Consumer devices like tablets and PCs, where LLMs can support productivity.
Smart voice systems and enterprise meeting solutions.
Telecommunication edge computing, where melding 5G and AI is a big push. China Mobile invested in us partly because of the potential in that space.

Our bandwidth is limited, so we’re staying focused. But longer term, any scenario where inference needs to happen at the edge, especially where power efficiency matters, could become a target market. That includes smart office, smart industry, and robotics.

36Kr: You’ve been in the market yourself. How would you characterize edge AI?

WQ: It’s cost-sensitive. Power-sensitive. And size matters: your chip can’t be a giant card. Thermal design is critical, and deployments are extreme.

36Kr: Your research background in high-efficiency chips and your startup’s pivot to CIM seem to align perfectly with the rise of large models. Looking back, do you feel like this was all leading up to now?

WQ: Maybe it was. Maybe everything was pointing to this moment. The industry and the market handed us this opportunity, and we were ready. At the time, the pivot was painful. But looking back, I’m glad we moved early. We laid the foundation and waited for the wind to blow.

36Kr: CIM is cutting-edge tech. What’s the current state of industry consensus around it?

WQ: Compared to four years ago when we just entered the space, there has been a huge shift.

First, more and more mainstream AI chipmakers are talking about CIM. Today, you’ll hear listed companies and unicorn startups pitching next-gen architectures that “break away from von Neumann.” That wasn’t the case four years ago. Back then, only some companies like Samsung were discussing CIM.

Second, the government is paying attention. The National Development and Reform Commission (NDRC) and the Ministry of Industry and Information Technology (MIIT) have held closed-door meetings that we’ve participated in. CIM is now grouped with other frontier chip technologies like photonic and quantum computing.

Third, investor understanding of the tech has matured. It’s no longer an obscure niche. Unlike before, when only a few funds understood CIM, now many firms have done their homework.

I’d say there’s growing consensus around the value CIM brings to AI workloads. But how to do CIM and how to commercialize it is still a matter of exploration. The community is divided.

For example, some still focus on low-power, small-scale CIM. But others are now chasing high-performance CIM. Even the choice of memory medium is up for debate, some are using NOR flash, some SRAM, some DRAM, some RRAM.

Everyone is racing to carve out territory. The key is who can deliver a truly usable product with high energy efficiency and high area efficiency. Among competitors, Houmo has been one of the earliest to focus on high-performance CIM using SRAM and DRAM. We’ve been in SRAM-CIM longer than anyone else, and we’ve already spent over a year on DRAM-based PIM.

36Kr: CIM has clear technical advantages. But what are the major challenges in commercializing such an innovative architecture?

WQ: We’ve been working seriously on CIM for four years now. The road from academic concept to marketable product is long and tough.

The first hurdle is circuit design. Academics prove that a concept works. But product design requires breakthroughs in electric circuit layout that meet real-world needs to secure high compute power, precision, and reliability. That demands innovation on top of the academic foundation.

Next, mass production comes with engineering hurdles. To manufacture chips, you need to solve issues like testability and yield. We had to expand beyond standard EDA (electronic design automation) tools and develop our own design systems tailored to CIM, like MBIST (memory built-in self-test) and CBIST (continuous built-in self-test). After four years of trial and error, we’ve got a robust, validated solution.

Here’s an example: one of CIM’s biggest strengths is area efficiency by enabling more compute per square millimeter. But high density brings risks, like instantaneous current surges that cause voltage drops. Those engineering issues must be designed around in advance. We’ve hit many roadblocks but also solved many of them. In hindsight, all that struggle gave us invaluable knowledge.

Finally, you need to design an AI processor architecture and compiler toolchain that fully unlocks the potential of CIM. Remember: CIM is a backend technology. The customer doesn’t see it. You need a frontend system to expose its benefits.

Think of it like building a house with high-tech bricks. If the architecture and control system aren’t great, those advanced bricks won’t matter. That’s what our CIM architecture IPU and Houmo Dadao compiler are for. They are the blueprint and the control panel.

36Kr: By adding CIM-based compute units, do you increase the complexity of integration with other parts of the chip? Will clients need to make major software changes?

WQ: Not really. On the customer side, everything still runs on PyTorch, TensorFlow, basically all the same frameworks. The entry point and output stay the same. The heavy lifting is on us. We design the chip internals, the architecture, and the toolchain so that everything just works seamlessly on the client end.

36Kr: Does CIM depend on advanced process nodes?

WQ: Actually, it’s less reliant. Because CIM reduces the number of data transfers, you get efficiency gains without needing the latest process node. It’s less demanding in that sense.

36Kr: Is there a risk that system-on-chip (SoC) manufacturers will eventually build their own external NPUs and undercut you?

WQ: SoC and CPU design is a very different skill set from designing NPUs. We specialize in AI compute and compiler systems. If needed, we can build lightweight SoCs to showcase our technology. But that’s not our core competency. It’s just a vehicle to highlight our strengths.

There’s some overlap and SoC makers might try to enter the NPU space, and vice versa. But it’s not easy. Back when I worked at Intel in the early 2000s, it had a market cap of USD 200 billion, while Nvidia was worth a fraction of that. Intel tried to make mobile CPUs, and Nvidia tried CPUs too. Neither succeeded. That shows how deep the capability gap is.

Will CPUs eventually absorb NPUs? That depends on the use case. If the AI workload is light, then yes, integrated NPUs (iNPU) can handle it. But if the demand is high and growing, standalone NPUs will persist, just like discrete GPUs have thrived alongside iGPUs.

36Kr: Horizon Robotics founder Yu Kai once said that the best strategy is not to gamble. You often stress the word “survival.” Would you say you have a no-gambling mindset too?

WQ: I’ve seen the brink. At Intel, AMD, Facebook, and earlier startups, I’ve watched companies navigate existential threats. Those experiences taught me that cool tech alone isn’t enough. The business model has to close the loop. You can only keep building if you survive. And when you’ve seen what it’s like on the edge of collapse, you realize how critical survival really is.

36Kr: For scientist-founders, becoming a businessperson is often the hardest part. When did that shift happen for you?

WQ: A bit of both. As a techie, I used to focus on competitions, publishing papers, and all that. But after founding Houmo, reality hit. If you only care about elegance, your company won’t make it.

Many scientist-founders don’t make that shift by choice. It’s forced on them.

36Kr: Was that transition hard for you personally?

WQ: It was tough. In earlier companies, even during crises, I wasn’t the one making final calls, so the pressure didn’t hit as hard. But after starting Houmo, I felt the full weight. Late 2020 to early 2022 was a great time. I could focus on technology and product development.

But from late 2022 onward, things got rough. The capital market cooled, US investors stopped backing Chinese semiconductor firms, and money dried up. That’s when it felt like I was trapped in a sealed room, gasping for oxygen. I had to break out. So I stepped out of my technical comfort zone. I helped with fundraising, pitched clients, and took on the role of entrepreneur. When survival is on the line, you have to shed your ego and do what’s necessary.

36Kr: What’s the story behind the company’s name?

WQ: When I started the company in late 2020, it was the peak season of “domestic substitution.” But I didn’t want to just replicate. I wanted to innovate and challenge the giants with new architectures. I was looking at next-gen approaches: CIM, photonics, quantum. CIM stood out as the most mature and the most practical, plus our founding team had the right background.

Even now, with the rise of large models, our mission hasn’t changed. We want to use next-gen technology like CIM to build extremely efficient chips for the next 50 years of computing. Not just to match the local standard, but to go toe-to-toe with Silicon Valley.

36Kr: After four years of building, what belief has stayed with you throughout?

WQ: It’s what I wrote in my WeChat bio: entrepreneurship means bravely walking through chaos.

KrASIA Connection features translated and adapted content that was originally published by 36Kr. This article was written by Xiao Xi for 36Kr.