DeepSeek tops the App Store, a week of seismic shifts in the U.S. tech industry triggered by Chinese AI

DeepSeek Model Shakes Silicon Valley, Value Continues to Rise

On December 26, 2024, DeepSeek officially released the DeepSeek-V3 large model.

This model performed excellently in multiple benchmark tests, surpassing mainstream top models in the industry, particularly in knowledge Q&A, long text processing, code generation, and mathematical abilities. For instance, in knowledge tasks like MMLU and GPQA, DeepSeek-V3's performance is close to the internationally top models Claude-3.5-Sonnet.

In terms of mathematical ability, it set new records in tests such as AIME 2024 and CNMO 2024, surpassing all known open-source and closed-source models. Additionally, its generation speed improved by 200% compared to the previous generation, reaching 60 TPS, significantly enhancing user experience.

According to an analysis by the independent evaluation site Artificial Analysis, DeepSeek-V3 surpassed other open-source models on several key metrics and performed on par with the world's top closed-source models GPT-4o and Claude-3.5-Sonnet.

The core technological advantages of DeepSeek-V3 include:

Mixture of Experts (MoE) Architecture: DeepSeek-V3 has 671 billion parameters, but in actual operation, only 37 billion parameters are activated for each input. This selective activation greatly reduces computational costs while maintaining high performance.
Multi-Head Latent Attention (MLA): This architecture has been validated in DeepSeek-V2 and enables efficient training and inference.
Load Balancing Strategy without Auxiliary Loss: This strategy aims to minimize the negative impact of load balancing on model performance.
Multi-Token Prediction Training Objective: This strategy enhances the overall performance of the model.
Efficient Training Framework: Utilizing the HAI-LLM framework, it supports 16-way Pipeline Parallelism (PP), 64-way Expert Parallelism (EP), and ZeRO-1 Data Parallelism (DP), and reduces training costs through various optimization methods.

More importantly, the training cost of DeepSeek-V3 is only $5.58 million, far lower than the $78 million training cost of GPT-4. Moreover, its API service pricing continues to follow a user-friendly approach.

Input tokens cost only 0.5 yuan per million (cache hit) or 2 yuan (cache miss), while output tokens cost only 8 yuan per million.

The Financial Times described it as "a dark horse that has shocked the international tech community," believing its performance is comparable to well-funded American competitors like OpenAI. Chris McKay, founder of Maginative, further pointed out that the success of DeepSeek-V3 may redefine the established methods of AI model development.

In other words, the success of DeepSeek-V3 is also seen as a direct response to the U.S. export restrictions on computing power, as this external pressure has stimulated innovation in China.

DeepSeek Founder Liang Wenfeng, the Low-Key Genius from Zhejiang University

The rise of DeepSeek has left Silicon Valley restless, and its founder Liang Wenfeng perfectly embodies the traditional Chinese trajectory of genius—achieving success at a young age and remaining relevant over time.

A good AI company leader needs to understand both technology and business, have vision and pragmatism, possess innovative courage and engineering discipline. This type of interdisciplinary talent is itself a scarce resource.

At 17, he was admitted to Zhejiang University, majoring in Information and Electronic Engineering. At 30, he founded Hquant and began leading a team to explore fully automated quantitative trading. Liang Wenfeng's story confirms that geniuses always do the right thing at the right time.

2010: With the launch of the CSI 300 stock index futures, quantitative investment welcomed development opportunities, and the Hquant team quickly grew its proprietary capital.
2015: Liang Wenfeng co-founded Hquant with alumni, launching the first AI model the following year, which implemented trading positions generated by deep learning.
2017: Hquant claimed to have fully AI-optimized its investment strategies.
2018: Established AI as the company's main development direction.
2019: The asset management scale exceeded 10 billion yuan, becoming one of the "four giants" in domestic quantitative private equity.
2021: Hquant became the first quantitative private equity firm in China to surpass a scale of 100 billion yuan.

You cannot only remember this company during its successful moments; however, just like the transformation of quantitative trading companies into AI, it seems unexpected but is actually a natural progression—because they are both data-driven, technology-intensive industries.

Jensen Huang only wanted to sell gaming graphics cards to profit from us gamers, but unexpectedly became the world's largest AI arsenal. Hquant's entry into the AI field is similarly remarkable. This evolution is more vital than many industries that rigidly apply AI large models.

Hquant has accumulated a wealth of experience in data processing and algorithm optimization during its quantitative investment process, and possesses a large number of A100 chips, providing strong hardware support for AI model training. Since 2017, Hquant has been laying out AI computing power on a large scale, building high-performance computing clusters like "Firefly One" and "Firefly Two" to provide robust computational support for AI model training.

In 2023, Hquant officially established DeepSeek, focusing on the research and development of large AI models. DeepSeek inherits Hquant's accumulation in technology, talent, and resources, quickly emerging in the AI field.

In a deep interview with "Dark Current," DeepSeek founder Liang Wenfeng also demonstrated a unique strategic vision.

Unlike most Chinese companies that choose to replicate the Llama architecture, DeepSeek directly addresses the model structure, aiming for the grand goal of AGI.

Liang Wenfeng candidly acknowledged the significant gap between current Chinese AI and international top levels, noting that the comprehensive gap in model structure, training dynamics, and data efficiency requires four times the computing power to achieve equivalent results.

▲Image from CCTV News screenshot

This attitude of facing challenges stems from Liang Wenfeng's years of experience at Hquant.

He emphasized that open source is not only about technical sharing but also a cultural expression; the real moat lies in the team's continuous innovation capability. DeepSeek's unique organizational culture encourages bottom-up innovation, downplays hierarchy, and values the enthusiasm and creativity of talent.

The team mainly consists of young people from top universities, adopting a natural division of labor model that allows employees to explore and collaborate autonomously. When recruiting, they place more emphasis on candidates' passion and curiosity rather than traditional notions of experience and background.

Regarding the industry's prospects, Liang Wenfeng believes that AI is currently in a period of technological innovation explosion, rather than an application explosion. He stressed that China needs more original technological innovations and cannot remain in a stage of imitation forever; someone needs to stand at the forefront of technology.

Even though companies like OpenAI are currently in a leading position, opportunities for innovation still exist.

Shaking Silicon Valley, DeepSeek Makes Overseas AI Community Uneasy

Although the industry's evaluations of DeepSeek vary, we have also gathered some comments from industry insiders.

Jim Fan, head of the GEAR Lab project at NVIDIA, gave high praise to DeepSeek-R1.

He pointed out that this represents non-American companies practicing OpenAI's original open mission, achieving influence through publicizing original algorithms and learning curves, while also subtly critiquing OpenAI.

DeepSeek-R1 not only open-sourced a series of models but also disclosed all training secrets. They may be the first to demonstrate significant and sustained growth in the RL flywheel as an open-source project.

Influence can be achieved through legendary projects like "ASI Internal Implementation" or "Strawberry Project," or simply by publicizing original algorithms and matplotlib learning curves.

Marc Andreessen, founder of top Wall Street venture capital A16Z, believes that DeepSeek R1 is one of the most astonishing and impressive breakthroughs he has ever seen, and as an open source, it is a profoundly meaningful gift to the world.

Former Tencent senior researcher and postdoctoral fellow in artificial intelligence at Peking University, Lu Jing, analyzed from the perspective of technological accumulation. He pointed out that DeepSeek did not suddenly become popular; it built upon many innovations from the previous generation of model versions, and the related model architectures and algorithm innovations have been iteratively validated, making its industry impact inevitable.

Turing Award winner and Meta's chief AI scientist Yann LeCun offered a new perspective:

"For those who see DeepSeek's performance and think 'China is surpassing the U.S. in AI,' your interpretation is wrong. The correct interpretation should be, 'Open-source models are surpassing proprietary models.'"

DeepMind CEO Demis Hassabis's evaluation revealed a hint of concern:

"The achievements of DeepSeek are impressive. I think we need to consider how to maintain the lead of Western frontier models. I believe the West is still ahead, but it is certain that China has extremely strong engineering and scaling capabilities."

Microsoft CEO Satya Nadella stated at the World Economic Forum in Davos, Switzerland, that DeepSeek has effectively developed an open-source model that not only performs excellently in inference calculations but also has extremely high supercomputing efficiency.

He emphasized that Microsoft must respond to China's groundbreaking advancements with the utmost seriousness.

Meta CEO Mark Zuckerberg's evaluation was even more in-depth; he noted that the technical strength and performance demonstrated by DeepSeek are impressive and pointed out that the AI gap between China and the U.S. has become negligible, making the competition increasingly fierce due to China's full-speed sprint.

Reactions from competitors may be the best recognition for DeepSeek. According to Meta employees' revelations on the anonymous workplace community TeamBlind, the emergence of DeepSeek-V3 and R1 has thrown Meta's generative AI team into a panic.

Meta engineers are racing against time to analyze DeepSeek's technology, trying to replicate any possible techniques.

The reason is that the training cost of DeepSeek-V3 is only $5.58 million, a figure that is even less than the annual salary of some Meta executives. Such a stark disparity in input-output ratio has put pressure on Meta's management when explaining its massive AI R&D budget.

International mainstream media has also given high attention to DeepSeek's rise.

The Financial Times pointed out that DeepSeek's success has overturned the traditional perception that "AI R&D must rely on huge investments," proving that a precise technological route can also achieve excellent research results. More importantly, the DeepSeek team's selfless sharing of technological innovations has made this research-focused company a particularly strong competitor.

The Economist stated that China's rapid breakthroughs in AI technology in terms of cost-effectiveness have begun to shake the technological advantages of the U.S., which may affect the productivity enhancement and economic growth potential of the U.S. over the next decade.

The New York Times approached from another angle, noting that DeepSeek-V3's performance is comparable to high-end chatbots from American companies, but at a significantly lower cost.

This indicates that even under chip export controls, Chinese companies can compete through innovation and efficient resource utilization. Moreover, the U.S. government's chip restriction policies may backfire, inadvertently promoting China's innovative breakthroughs in open-source AI technology.

DeepSeek "Knocks on the Door," Claims to be GPT-4

Amidst the praise, DeepSeek also faces some controversies.

Many outsiders believe that DeepSeek may have used output data from models like ChatGPT as training materials during the training process, transferring the "knowledge" from this data to DeepSeek's own model through model distillation techniques.

This practice is not uncommon in the AI field, but skeptics are concerned about whether DeepSeek used output data from OpenAI models without adequate disclosure. This seems to be reflected in DeepSeek-V3's self-perception.

Earlier, users discovered that when asked about its identity, the model mistakenly identified itself as GPT-4.

High-quality data has always been a crucial factor in AI development; even OpenAI has faced controversies over data acquisition, as its practice of scraping data from the internet has led to many copyright lawsuits. As of now, the first-instance ruling between OpenAI and The New York Times has yet to conclude, adding to the ongoing legal battles.

Thus, DeepSeek has also faced public insinuations from Sam Altman and John Schulman.

"Copying what you know works is (relatively) easy. Doing something new, risky, and difficult when you don't know if it will work is very hard."

However, the DeepSeek team explicitly stated in the R1 technical report that they did not use output data from OpenAI models and achieved high performance through reinforcement learning and unique training strategies.

For example, they employed a multi-stage training approach, including base model training, reinforcement learning (RL) training, and fine-tuning. This multi-stage cyclical training method helps the model absorb different knowledge and abilities at different stages.

Saving Money is Also a Technical Skill, DeepSeek's Technological Insights

The DeepSeek-R1 technical report mentions a noteworthy discovery: the "aha moment" that occurred during the R1 zero training process. In the mid-training phase of the model, DeepSeek-R1-Zero began to actively reassess its initial problem-solving approach and allocated more time to optimize strategies (such as trying different solutions multiple times).

In other words, through the RL framework, AI may spontaneously develop human-like reasoning abilities, even surpassing the limitations of preset rules. This also holds promise for developing more autonomous and adaptive AI models, such as dynamically adjusting strategies in complex decision-making (medical diagnosis, algorithm design).

At the same time, many industry insiders are trying to delve into DeepSeek's technical report. Former OpenAI co-founder Andrej Karpathy stated after the release of DeepSeek V3:

DeepSeek (this Chinese AI company) is refreshing today; it publicly released a cutting-edge language model (LLM) and completed training on an extremely low budget (2048 GPUs, lasting 2 months, costing $6 million).

For reference, this capability typically requires a cluster of 16K GPUs to support, while most of these advanced systems now use about 100K GPUs. For example, Llama 3 (405B parameters) used 30.8 million GPU hours, while DeepSeek-V3 seems to be a more powerful model that only used 2.8 million GPU hours (about 1/11 of Llama 3's computational load).

If this model also performs well in practical tests (for instance, the LLM Arena rankings are ongoing, and my quick tests performed well), then this will be a very impressive achievement demonstrating research and engineering capabilities under resource constraints.

So, does this mean we no longer need large GPU clusters to train cutting-edge LLMs? Not necessarily, but it indicates that you must ensure that the resources you use are not wasted. This case demonstrates that data and algorithm optimization can still lead to significant progress. Furthermore, this technical report is also very impressive and detailed, worth reading.

In response to the controversy surrounding DeepSeek V3's alleged use of ChatGPT data, Karpathy stated that large language models do not inherently possess human-like self-awareness. Whether a model can correctly answer its identity entirely depends on whether the development team specifically constructed a self-awareness training set. If not specifically trained, the model will respond based on the closest information in the training data.

Additionally, the model identifying itself as ChatGPT is not the issue; considering the prevalence of ChatGPT-related data on the internet, this response actually reflects a natural phenomenon of "emergent knowledge proximity."

Jim Fan pointed out after reading DeepSeek-R1's technical report:

The most important point of this paper is: it is completely driven by reinforcement learning, with no involvement of any supervised learning (SFT). This approach is similar to AlphaZero—mastering Go, Shogi, and Chess from scratch through "cold start" without needing to imitate human players' moves.

-- Using real rewards calculated based on hard-coded rules, rather than easily "cracked" learning-based reward models in reinforcement learning.

-- The model's thinking time steadily increases as the training progresses; this is not pre-programmed but a spontaneous characteristic.

-- There are phenomena of self-reflection and exploratory behavior.

-- Using GRPO instead of PPO: GRPO removes the commentator network in PPO and instead uses the average reward of multiple samples. This is a simple method that can reduce memory usage. Notably, GRPO was invented by the DeepSeek team in February 2024, truly showcasing a very powerful team.

On the same day, Kimi also released similar research results, and Jim Fan found that the research of both companies reached similar conclusions:

Both abandoned complex tree search methods like MCTS, turning to simpler linear thinking trajectories and adopting traditional autoregressive prediction methods.
Both avoided using value functions that require additional model copies, reducing computational resource demands and improving training efficiency.
Both discarded dense reward modeling, relying as much as possible on real outcomes as guidance, ensuring training stability.

However, there are also significant differences between the two:

DeepSeek adopts an AlphaZero-style pure RL cold start method, while Kimi k1.5 chooses an AlphaGo-Master-style warm-up strategy, using lightweight SFT.
DeepSeek is open-sourced under the MIT license, while Kimi performs excellently in multimodal benchmark tests, with its paper covering richer system design details, including RL infrastructure, mixed clusters, code sandboxes, and parallel strategies.

However, in this rapidly iterating AI market, leading advantages are often fleeting. Other model companies will surely quickly absorb DeepSeek's experiences and improve upon them, perhaps catching up soon.

Initiator of the Large Model Price War

Many people know that DeepSeek has been dubbed the "Pinduoduo of AI," but few understand that this nickname actually stems from the large model price war that erupted last year.

On May 6, 2024, DeepSeek released the open-source MoE model DeepSeek-V2, achieving breakthroughs in both performance and cost through innovative architectures like MLA (Multi-Head Latent Attention) and MoE (Mixture of Experts).

The inference cost was reduced to only 1 yuan per million tokens, about one-seventh of Llama3 70B's cost and one-seventieth of GPT-4 Turbo's cost. This technological breakthrough enabled DeepSeek to provide highly cost-effective services without incurring losses, while also putting immense competitive pressure on other vendors.

The release of DeepSeek-V2 triggered a chain reaction, with ByteDance, Baidu, Alibaba, Tencent, and Zhizhu AI all following suit and significantly lowering the prices of their large model products. The impact of this price war even crossed the Pacific, drawing significant attention from Silicon Valley.

As a result, DeepSeek has been dubbed the "Pinduoduo of AI."

In response to external skepticism, DeepSeek founder Liang Wenfeng stated in an interview with "Dark Current":

"Attracting users is not our primary goal. We lowered prices partly because we are exploring the structure of the next generation of models, and costs have come down; on the other hand, we believe that whether it's API or AI, it should be inclusive and affordable for everyone."

In fact, the significance of this price war goes far beyond competition itself; the lower entry barriers allow more enterprises and developers to access and apply cutting-edge AI, while also forcing the entire industry to rethink pricing strategies. It was during this period that DeepSeek began to enter the public eye and emerge prominently.

Spending a Fortune to Recruit AI Genius

A few weeks ago, DeepSeek also experienced a notable personnel change.

According to Yicai Global, Lei Jun successfully poached Luo Fuli with an annual salary of tens of millions, appointing her as the head of Xiaomi's AI Laboratory large model team.

Luo Fuli joined DeepSeek under Hquant in 2022, and her presence can be seen in important reports such as DeepSeek-V2 and the latest R1.

Subsequently, DeepSeek, which had focused on the B-end, began to lay out the C-end, launching a mobile application. As of the time of publication, DeepSeek's mobile app reached the second highest rank in the free version of the Apple App Store, showcasing strong competitiveness.

A series of small climaxes has brought DeepSeek into the spotlight, but at the same time, it is building up to a higher climax. On the evening of January 20, the ultra-large model DeepSeek R1, with 660 billion parameters, was officially released.

This model performed excellently in mathematical tasks, achieving a pass@1 score of 79.8% at AIME 2024, slightly surpassing OpenAI-o1; it scored as high as 97.3% on MATH-500, comparable to OpenAI-o1.

In programming tasks, it achieved a 2029 Elo rating on Codeforces, surpassing 96.3% of human participants. In knowledge benchmark tests such as MMLU, MMLU-Pro, and GPQA Diamond, DeepSeek R1 scored 90.8%, 84.0%, and 71.5% respectively, slightly lower than OpenAI-o1 but better than other closed-source models.

In the latest comprehensive ranking of the large model arena LM Arena, DeepSeek R1 ranked third, tied with o1.

In "Hard Prompts," "Coding," and "Math," DeepSeek R1 ranked first.
In "Style Control," DeepSeek R1 tied for first with o1.
In the test of "Hard Prompt with Style Control," DeepSeek R1 also tied for first with o1.

In terms of open-source strategy, R1 adopts the MIT License, granting users maximum freedom of use, supporting model distillation, and allowing inference capabilities to be distilled into smaller models. For instance, 32B and 70B models achieve performance comparable to o1-mini in multiple capabilities, with an open-source effort that even surpasses that of Meta, which has long been criticized.

The emergence of DeepSeek R1 allows domestic users to access models comparable to o1 level for free for the first time, breaking long-standing information barriers. The discussion frenzy it sparked on social platforms like Xiaohongshu is comparable to the initial release of GPT-4.

Going Global, Avoiding Involution

Looking back at DeepSeek's development trajectory, its success formula is clear: strength is the foundation, but brand recognition is the moat.

In a conversation with "Late Point," MiniMax CEO Yan Junjie shared his thoughts on the AI industry and the company's strategic shift. He emphasized two key turning points: recognizing the importance of a technology brand and understanding the value of open-source strategies.

Yan Junjie believes that in the AI field, the speed of technological evolution is more important than current achievements, and open-source can accelerate this process through community feedback. Additionally, a strong technology brand is crucial for attracting talent and acquiring resources.

Taking OpenAI as an example, despite later experiencing management turmoil, its early-established image of innovation and open-source spirit has garnered it a first wave of good impressions. Even though Claude has gradually matched it technically and is encroaching on OpenAI's B-end users, OpenAI still leads significantly in C-end users due to user path dependence.

In the AI field, the real competitive stage is always global; going out, avoiding involution, and promoting oneself is also a solid path.

This wave of going global has already stirred ripples in the industry, with earlier players like Qwen, Mianbi Intelligence, and recently DeepSeek R1, Kimi v1.5, and Doubao v1.5 Pro making significant noise overseas.

While 2025 is labeled as the year of intelligent agents and AI glasses, this year will also be an important milestone for Chinese AI companies embracing the global market. Going out will become an unavoidable keyword.

Moreover, the open-source strategy is also a good move, attracting a large number of tech bloggers and developers to spontaneously become DeepSeek's "water army." Technology for good should not just be a slogan; from the slogan "AI for All" to true technological inclusivity, DeepSeek has forged a path that is purer than OpenAI.

If OpenAI has shown us the power of AI, then DeepSeek has made us believe:

This power will ultimately benefit everyone.

Industry Chat

Thoughts and Essays on the Cryptocurrency Industry

Topic or theme

ChainCatcher reminds readers to view blockchain rationally, enhance risk awareness, and be cautious of various virtual token issuances and speculations. All content on this site is solely market information or related party opinions, and does not constitute any form of investment advice. If you find sensitive information in the content, please click "Report", and we will handle it promptly.