DeepSeek's new chatbot boasts a surprising capability: answering virtually any question. This AI, a product of the Chinese startup, has rapidly become a major market player, even causing significant drops in NVIDIA's stock price.
DeepSeek's success stems from its innovative architecture and training methods. Key technologies include:
DeepSeek initially claimed a remarkably low training cost of $6 million for its powerful DeepSeek V3 model using only 2048 GPUs.
However, SemiAnalysis revealed DeepSeek's use of approximately 50,000 Nvidia Hopper GPUs—including 10,000 H800, 10,000 H100, and additional H20 units—across multiple data centers. This represents a total server investment of roughly $1.6 billion and operational expenses nearing $944 million.
DeepSeek, a subsidiary of High-Flyer hedge fund, owns its data centers, providing control over optimization and faster innovation. Its self-funded status enhances flexibility. Furthermore, DeepSeek attracts top talent, with some researchers earning over $1.3 million annually, primarily from Chinese universities.
DeepSeek's initial $6 million training cost claim is misleading; it only covers pre-training GPU usage, excluding research, refinement, data processing, and infrastructure. The company's total AI development investment exceeds $500 million. Its lean structure, however, allows for efficient innovation compared to larger, more bureaucratic companies.
DeepSeek's success highlights the potential of well-funded independent AI companies to compete with industry giants. While its "revolutionary budget" claims are exaggerated, its success is undeniable, resulting from substantial investment, technological breakthroughs, and a strong team. The cost difference is stark: DeepSeek's R1 model cost $5 million to train, compared to ChatGPT4's $100 million. However, it's still cheaper than its competitors.