Deepseek R1 routs the competitors

Published:

Chinese AI startup DeepSeek disrupts tech markets with low-cost innovation

A Chinese artificial intelligence startup has sent shockwaves through global technology markets with its latest AI model, challenging U.S. tech dominance and raising questions about the effectiveness of export controls on advanced chips.

As reported here, DeepSeek, founded in 2023 by hedge fund manager Liang Wenfeng in Hangzhou, has risen to prominence after its chatbot became the most downloaded free app on Apple’s U.S. App Store, surpassing OpenAI‘s ChatGPT. The surge in popularity followed the release of their new R1 model, which reportedly achieves performance comparable to leading AI models at a fraction of the cost.

DeepSeek R1 vs OpenAI O1

A detailed technical analysis reveals significant differences between the two models:

DeepSeek R1 employs a “thinking out loud” approach and it uses a Mixture-of-Experts (MoE) architecture with 671B total parameters, though only 37B are activated at any given time. The model was trained using Group Relative Policy Optimization (GRPO), bypassing supervised fine-tuning. It operates with a context length of 128K tokens and has demonstrated impressive performance benchmarks: 97.3% in math, 96.3rd percentile in coding, and 90.8% in general knowledge.

In contrast, OpenAI’s O1 uses a “thinking before generating” approach with a dense transformer architecture where all parameters remain active during computations. It combines supervised fine-tuning with RLHF for guided performance and supports a longer context length of 200K tokens. Its benchmarks show 96.4% in math, the 89th percentile in coding, and 91.8% in general knowledge.

DeepSeek R1 vs. OpenAI O1

Cost and accessibility

Perhaps the most striking difference lies in the cost structure and accessibility. DeepSeek R1 was developed for approximately $5.58M, utilizing 2.78M GPU hours. It’s available as open-source software under the MIT license, allowing free use, modification, and distribution. Its API costs are significantly lower, with rates of $0.14 for cache hits, $0.55 for cache misses, and $2.19 for output per million tokens.

>>>  Dead people will live on as avatars, while living ones might live forever

OpenAI’s O1, while its training costs remain undisclosed, operates on a paid API model with substantially higher rates: $7.5 for cache hits, $15 for cache misses, and $60 for output per million tokens.

The news has triggered significant market turmoil across the global technology sector. Major U.S. technology companies, including Nvidia, Microsoft, and Meta, saw their share prices decline. The impact extended to European tech firms, with Dutch chip equipment maker ASML’s shares falling over 10%, while Siemens Energy, which produces AI-related hardware, experienced a 21% drop in share value.

DeepSeek’s claimed ability to develop advanced AI models at dramatically lower costs is what sets it apart. The company reports developing its V3 model for approximately $6 million, which contrasts sharply with competitors like OpenAI, which reportedly spent over $100 million on GPT-4. DeepSeek claims to have used only about 2,000 specialized chips for training, compared to the 16,000 or more typically required by leading models.

DeepSeek’s approach represents a potential shift in AI development. The company leverages existing technology and open-source code, and their models reportedly require significantly less computing power. The startup has adapted to U.S. chip export restrictions by experimenting with new approaches and combining high-end chips with cheaper alternatives.

The development raises significant questions about U.S. technology policy. DeepSeek’s success suggests that U.S. export restrictions on advanced chips might inadvertently accelerate Chinese innovation. The company’s founder, Liang Wenfeng, reportedly accumulated a substantial cache of Nvidia A100 chips before export bans took effect. Recent U.S. initiatives, including a major AI infrastructure investment project supported by President Trump, aim to maintain American technological leadership.

>>>  AI that already harms

The development has prompted varied reactions across the industry. Silicon Valley venture capitalist Marc Andreessen described DeepSeek’s emergence as “AI’s Sputnik moment.” Wall Street firm Citi maintains that U.S. companies still hold advantages due to their access to more advanced chips. Meanwhile, Meta has announced plans for $60-65 billion in capital investment, potentially in response to these developments.

The situation highlights an evolving dynamic in AI development, where open research and cost efficiency might prove more decisive than market share or access to cutting-edge hardware. This could signal a shift in the global AI landscape, challenging assumptions about the resources required for breakthrough innovations in artificial intelligence.

The new AI Cold War and its societal implications

The emergence of DeepSeek as a formidable competitor to U.S. AI giants bears striking parallels to the Cold War era’s technological race. Just as the Space Race between the United States and the Soviet Union defined the 1960s, the AI competition between the U.S. and China appears to be shaping the 2020s. However, this time the stakes may be even higher, as AI technology has the potential to transform virtually every aspect of modern society.

This technological rivalry presents both opportunities and challenges for global society. The competition is driving rapid innovation and cost reduction in AI development, which could accelerate the democratization of advanced AI capabilities. DeepSeek’s open-source approach and dramatically lower costs could make sophisticated AI tools more accessible to researchers, businesses, and developers worldwide, potentially fostering innovation across sectors from healthcare to education.

>>>  The danger of AI

However, the geopolitical tensions surrounding AI development raise concerns about the fragmentation of the global technological landscape. The creation of separate AI ecosystems—one centered in the U.S. and another in China—could lead to divergent standards, incompatible systems, and reduced international collaboration. This scenario could hinder the global scientific community’s ability to address shared challenges and establish universal ethical guidelines for AI development.

Moreover, the race to achieve AI supremacy might pressure developers to prioritize speed over safety, raising crucial questions about AI governance and security. As both nations push to maintain or gain technological advantage, the international community faces the critical challenge of ensuring that AI development proceeds responsibly, with adequate attention to safety, ethics, and the broader implications for human society.

As this new technological Cold War unfolds, the true measure of success may not lie in which nation achieves AI superiority, but in how this powerful technology is ultimately harnessed to benefit humanity while managing its inherent risks. The DeepSeek story suggests that innovation can come from unexpected places and that technological progress might not follow predictable paths. This reality underscores the importance of maintaining open dialogue and collaboration across borders, even as nations compete for technological leadership in the AI age.

Related articles

Recent articles