TLDR
- Nvidia released new data showing its latest AI server can run mixture-of-expert models 10 times faster than previous generation servers
- The new server packs 72 of Nvidia’s leading chips into one computer with fast connections between them
- Performance gains shown on Chinese AI models from Moonshoot AI (Kimi K2 Thinking) and DeepSeek
- The AI industry is shifting focus from training models to serving them to millions of users, where Nvidia faces more competition from AMD and Cerebras
- AMD is developing a similar multi-chip server expected to launch next year
Nvidia dropped fresh performance data Wednesday showing its newest AI server delivers 10 times better speed than older systems. The company tested the tech on popular Chinese AI models that have taken the industry by storm.
The new server crams 72 of Nvidia’s top-tier chips into a single machine. Fast links between the chips drive most of the performance gains. Nvidia still holds advantages in this area over competitors.
The timing matters. The AI world is moving from training models to actually using them. That shift opens the door for rivals like Advanced Micro Devices and Cerebras to grab market share.
Nvidia tested its server on mixture-of-expert models. These systems break questions into chunks and send each piece to different “experts” inside the model. The approach saves computing power and training costs.
China’s DeepSeek made waves in early 2025 with a high-performing open source model using this method. The model needed less training time on Nvidia chips than competing systems. OpenAI, France’s Mistral, and China’s Moonshoot AI have all adopted similar designs since then.
New Hardware Targets Growing Market
Moonshoot AI released its own highly-ranked open source model in July. Nvidia’s latest server improved the performance of Moonshoot’s Kimi K2 Thinking model by 10 times. The company saw similar gains with DeepSeek’s models.
The 72-chip design lets Nvidia pack more computing power into less space. Those fast connections between chips make the difference when handling millions of user requests.
Training AI models was just the first wave of the boom. Running those models for everyday users could become the bigger business. Companies need servers that can handle constant traffic, not just occasional training runs.
Competition Heats Up
AMD is building its own multi-chip server packed with powerful processors. The rival system is set to hit the market next year. That sets up a direct battle for the serving market.
DeepSeek proved you don’t need massive budgets to build competitive AI. The open source approach helped regional players like Moonshoot AI and European companies like Mistral create their own models.
Training costs are dropping. But the hardware needed to serve models at scale is getting more specialized. That keeps Nvidia, AMD, and a small group of chipmakers at the center of AI infrastructure.
Nvidia is making the case that even if models need less training, companies will still need its hardware to serve them. The 72-GPU server is designed to become the default choice for high-traffic AI services.
The mixture-of-expert approach that DeepSeek popularized is spreading fast. Cloud providers and tech firms are looking for dense servers that can process billions of daily requests. Nvidia’s new system directly targets that spending.
AMD’s upcoming multi-GPU server will test whether Nvidia can maintain its dominance. The company has led the training market but faces tougher competition in the serving space.


