DeepSeek Claims AI Model Training Cost Just $294,000, Far Below US Rivals

Table of Contents

TLDR

DeepSeek claims its R1 AI model cost only $294,000 to train using 512 Nvidia H800 chips over 80 hours
The cost figure was published in Nature journal, marking the first time DeepSeek revealed training expenses
OpenAI’s Sam Altman said their foundational model training cost “much more” than $100 million in comparison
DeepSeek admitted for the first time it owns A100 chips, which were used in preparatory development stages
The H800 chips were designed specifically for China after the US banned exports of more powerful AI chips

Chinese AI developer DeepSeek has revealed that its R1 artificial intelligence model cost just $294,000 to train. The figure represents a fraction of what US companies typically spend on similar AI development projects.

The Hangzhou-based company published this cost estimate for the first time in a peer-reviewed article in Nature journal on Wednesday. The disclosure marks a rare public statement from DeepSeek about its development expenses.

DeepSeek used 512 Nvidia H800 chips to train its reasoning-focused R1 model over 80 hours. The H800 chips were specifically designed by Nvidia for the Chinese market after US export restrictions took effect.

The US banned the export of more powerful H100 and A100 AI chips to China in October 2022. This forced Chinese companies to work with less powerful alternatives or find other solutions for their AI development needs.

DeepSeek’s cost claims contrast sharply with figures from US competitors. OpenAI CEO Sam Altman said in 2023 that foundational model training at his company cost “much more” than $100 million.

The Nature article listed DeepSeek founder Liang Wenfeng as one of the co-authors. A previous version of the research published in January did not contain the cost information.

Hardware Revelations

In supplementary documentation, DeepSeek acknowledged owning A100 chips for the first time. The company said it used these more powerful chips during preparatory stages of development.

“Regarding our research on DeepSeek-R1, we utilized the A100 GPUs to prepare for the experiments with a smaller model,” the researchers wrote. This admission provides new details about DeepSeek’s hardware capabilities.

US officials told Reuters in June that DeepSeek has access to “large volumes” of H100 chips acquired after export controls were implemented. Nvidia has stated that DeepSeek used lawfully acquired H800 chips, not H100s.

Market Impact and Questions

DeepSeek’s release of lower-cost AI systems in January caused global investors to sell tech stocks. Investors worried these new models could threaten the dominance of AI leaders including Nvidia.

Since January, DeepSeek and founder Liang Wenfeng have largely stayed out of public view. The company has released only a few product updates during this period.

Some of DeepSeek’s statements about development costs and technology have been questioned by US companies and officials. The debate continues over China’s actual capabilities in AI development.

Training costs for large-language models refer to expenses from running powerful chip clusters for weeks or months. These systems process vast amounts of text and code to develop AI capabilities.

Reuters previously reported that DeepSeek attracted top Chinese talent partly because it operated one of the few domestic A100 supercomputing clusters. This hardware advantage helped the company recruit leading researchers in the field.

The R1 model focuses on reasoning capabilities rather than general language processing. DeepSeek trained the model after completing initial experiments with smaller systems using A100 hardware.

DeepSeek Claims AI Model Training Cost Just $294,000, Far Below US Rivals

TLDR

Hardware Revelations

Market Impact and Questions

Related Posts