DeepSeek's Holistic Approach to Progress

During the New Year celebrations, I found myself immersed in leisure, feeling guilty as I overlooked the readers’ requests for us to write about DeepSeek. To my surprise, I discovered that this urgency was not limited to technology bloggers. In the comment section of a comedy blogger’s post, completely unrelated to AI, someone mentioned, “Why haven’t you talked about DeepSeek yet? It has shaken the U.S. stock market and shattered American fantasies. All those tech companies in the West are in panic mode.”

This kind of fervor escalated quickly. Comments like “Tech giants are doomed,” “AGI is just around the corner,” and “Ordinary people must learn DeepSeek now or it will be too late” flourished. There were even sensationalist narratives about DeepSeek being targeted by large-scale cyberattacks from abroad, with top talents from various tech companies teaming up with China’s underground hackers to rescue it.

The public sentiment surrounding DeepSeek has become increasingly ridiculous. When we talk about it, we seem to fall into a euphoric frenzy. Sure, some of this excitement stems from the impressive capabilities of DeepSeek and the recent boom in AI technology, but it’s hard to deny that geopolitical factors are also at play. Many need a narrative where “foreigners are afraid and bow down” especially in the tech realm.

To cater to this sentiment, media outlets and public figures have a penchant for sensationalizing discussions surrounding DeepSeek, elevating it to issues of philosophy, national fortunes, and historical direction. Such discussions, often exaggerated and disconnected from reality, lead to expectations and responsibilities that DeepSeek should not have to carry—this phenomenon is what we call being “put on a pedestal.”

However, the view from that pedestal isn’t particularly pleasant. Past experiences remind us that often the next step leads to backlash such as “the bubble bursts” or “great figures fade away.” For the emerging star DeepSeek and its development team, this trajectory could be more harmful than beneficial.

Thus, we want to discuss the consensus that might facilitate a more objective conversation about DeepSeek at this stage. In other words, let’s try to dismantle the pedestal of public opinion and return to a more genuine and simpler understanding of DeepSeek.

First, let's clarify a bold statement: Contrary to popular belief circulating on social media, DeepSeek has not achieved a core technological breakthrough from zero to one.

Following the rise of DeepSeek, its development team and tech industry insiders have been echoing that Chinese AI should not merely follow but must achieve breakthroughs from zero to one. This assertion is undoubtedly correct, but DeepSeek may not yet embody that assertion.

What constitutes a core technological breakthrough typically involves a major shift in primary technological pathways or a significant upgrade in results. DeepSeek’s most notable technological achievements include the R1 model’s reasoning process known as “chain of thought inference” and its impressive performance with the RAG (Retrieval-Augmented Generation) effect. However, neither of these technological paths is original to DeepSeek. The rise of chain of thought reasoning is generally attributed to OpenAI’s release of the o1 model, which stirred the global landscape to adopt similar capabilities.

Furthermore, regarding connected retrieval, many competing firms have been working in this area, implementing RAG as a way to solve the problem of large models lacking real-time information while correcting their “hallucinations.” As early as 2023, Baidu included RAG as a core ability in its Wenxin Yiyan model.

However, it is crucial to note that a lack of innovation from zero to one does not equate to a lack of innovation altogether. DeepSeek has conducted substantial pioneering work optimizing model capabilities, including making the model more efficient through the GRPO algorithm. One could argue that DeepSeek integrates industry-leading, proven technological pathways and, upon that foundation, achieves model optimization, capability enhancement, and user experience upgrades.

We often yearn for innovations that originate from nothing, expecting dramatic breakthroughs. But objectively speaking, the distance traveled between the first and the millionth step is similar.

So, where is the true value that DeepSeek has stirred global interest? After a short period of hype following the New Year’s festivities, many have likely forgotten that what set DeepSeek apart initially was its ability to train the DeepSeek-V3 model with remarkably low computational costs through software and architectural innovations.

DeepSeek-V3 serves as the foundational model for the R1 model we utilize today. In a paper published by the development team, it is stated that they completed the training of a model with 671 billion parameters at a computation cost of only 5.5 million dollars. Even if this figure pertains solely to the training costs of the foundational model, excluding subsequent applications, reinforcement learning, model inference, and overall expenses, it is indisputably revolutionary in terms of the mainstream cost model for training large models.

Ultimately, DeepSeek has achieved results comparable to those of mainstream models represented by o1. Although it is challenging to say that it has completely surpassed these models in performance, it indeed has managed to effectuate a decrease in hardware costs through software innovation. This means low-cost models can perform competently against high-cost models, allowing open-source models to catch up with their closed-source counterparts.

Moreover, the breakthrough of DeepSeek in "reducing AI computational costs" coincided with a time when major AI players globally were hoarding high-end GPUs to cement their industry strongholds, alongside the U.S. prohibiting the sale of advanced AI chips to China in an attempt to curb its AI capabilities. This created a dramatic narrative where a Chinese AI model momentarily rattled the American stock market.

Although the efficiency improvements and cost reductions brought about by DeepSeek triggered cascading effects in a uniquely industrial and international context, many who don't typically follow AI may not grasp the scaling laws denoting that larger model computations yield better results, nor the context of computational monopolies and trade bans. Instead, they simply hear that DeepSeek has risen to prominence, causing foreigners to worry and fear. Such lack of understanding has unfairly constructed a pedestal for DeepSeek that it does not deserve.

Additionally, it is important to recognize that many of us are enamored with dramatic, genius-like innovations. Yet, in reality, it is often through engineering capabilities—constant adjustments, cost reductions, efficiency improvements—that technological innovation finds its application and widespread acceptance.

Take Thomas Edison’s invention of the light bulb, for instance; while it’s commonly acknowledged, it’s easy to overlook how the large-scale electric grid significantly decreased the cost of electricity. If each household had to generate its own power, the world would be plunged into darkness.

“We have cut down the costs.”

This somewhat absurd, clichéd, and ironic statement captures the essence of China’s industrial capabilities.

We shouldn’t shy away from acknowledging that Chinese AI, including DeepSeek, will be most adept at significantly lowering costs for a considerable amount of time to come.

The success of DeepSeek, to a large extent, also rests on its humanistic capabilities.

After DeepSeek’s meteoric rise, many proclaimed it a monumental victory for technology. Some even speculated that the advent of AI would render humanities and liberal arts valueless. Commentary like “With DeepSeek here, is there still any significance to studying the liberal arts?” went viral.

However, if we employ DeepSeek extensively and analyze its distinct qualities compared to other models, we arrive at a contrary viewpoint: DeepSeek underscores the immense significance of the liberal arts and humanistic abilities in the age of AI.

If we were to casually ask individuals about their experience using DeepSeek, they would likely mention how conversing with it feels more personable.

This “personability” largely emerges not from the understanding or reasoning typical of AI technologies, but from the model's display of humor, internet savvy, and dialogue conventions that resonate with younger demographics. Additionally, DeepSeek demonstrates a refined sense of rhetoric, an elegant writing style, and impressive overall humanistic literacy.

Such attributes make DeepSeek align more closely with the conversational patterns and aesthetic needs of younger audiences, while also yielding responses that are more engaging and shareable. Yet, it’s essential to understand that these qualities are not primarily technological but closely tied to the selection of training materials and other humanistic factors.

For instance, if you request DeepSeek to write a poem, it can produce lines that echo the tone and rhetoric beloved by the artistic youth. In contrast, other mainstream domestic models might offer compositions that, while structurally sound and rich in embellishment, tend to resemble simple “fatherly prose.”

Additionally, when tasked with predicting future trends, DeepSeek’s responses can very much reflect the imaginative flair of online science fiction narratives. It may not always hold rigorous analysis, yet it imbues a sense of strength and fervor among young people.

These advantages stem not from technology but from a young, aesthetic-driven development team that values humanistic elements in the training process. In contrast, many leading models often fall flat as their development teams tend to be composed of older individuals—often over 45—lacking backgrounds in the liberal arts, leading to conversations heavy on jargon, devoid of relatable content. Rather than simply attributing young support to DeepSeek, we should recognize that young people are advocating for their voices and aesthetics.

Moreover, DeepSeek still finds itself navigating fewer restrictions. Nonetheless, while regulatory oversight may be delayed, we shouldn’t harbor any illusions about the daring and sharp nature of AI for much longer.

DeepSeek cleverly enhances user conversational experiences and optimizes dissemination effects. These developments, occurring beyond the technological sphere, could inspire AI companies to reflect on product experiences and the importance of humanistic qualities.

It would be unfortunate if we overwhelming praised DeepSeek’s technology at the expense of neglecting its humanistic experience.

Considering the facets discussed, we can piece together a more accurate and unblemished portrayal of DeepSeek:

It represents a composite leap—comprising technological innovation, humanistic sensibility, open-source strategies, and low-cost approaches under a unique industrial cycle and international backdrop.

DeepSeek may not be a groundbreaking technological revolution, yet it is certainly mature and brimming with novelty. This also clarifies why AI experts and leaders in Europe and the U.S. tend to describe it as “impressive.”

DeepSeek has not shot straight to the pinnacle; there’s no need for us to fantasize about such a leap. What it has done is take a significant step forward, and we should take pride and confidence in that advancement.

I also strongly oppose the notion that China only has DeepSeek. In reality, China boasts a clear AI industrial landscape, a robust self-sufficient system for AI software and hardware, a vast pool of AI developers, and a proactive policy environment concerning AI. These elements provide fertile ground for more DeepSeek-like models to emerge. Considering these factors, I am confident that we will see additional models akin to DeepSeek arise until we reach the threshold of an AI industrial revolution and behold the dawn of AGI.

Why not take DeepSeek down from its pedestal? The rational and composed approach is to regard it with clear eyes, harness its potential wisely, and appreciate every creation of Chinese AI, as this represents genuine maturation of AI in China.

Wang Yangming once said, “Though a mountain may be a thousand fathoms high, one can only ascend one step at a time.” DeepSeek’s step holds significance, and after taking it, we may pause to appreciate the mountain breeze and sing softly. Yet, we must also stay aware that we remain stationed in the foothills.

Once we’re rested and joyous enough, there’s only one thing to do: take the next step.