Tumbling stock market values and wild claims have accompanied the release of a new AI chatbot by a small Chinese company. What makes it so different?
The release of China’s new DeepSeek AI-powered chatbot app has rocked the technology industry. It quickly overtook OpenAI’s ChatGPT as the most-downloaded free iOS app in the US, and caused chip-making company Nvidia to lose almost $600bn (£483bn) of its market value in one day – a new US stock market record.
The reason behind this tumult? The “large language model” (LLM) that powers the app has reasoning capabilities that are comparable to US models such as OpenAI’s o1, but reportedly requires a fraction of the cost to train and run.
DeepSeek claims to have achieved this by deploying several technical strategies that reduced both the amount of computation time required to train its model (called R1) and the amount of memory needed to store it. The reduction of these overheads resulted in a dramatic cutting of cost, says DeepSeek. R1’s base model V3 reportedly required 2.788 million hours to train (running across many graphical processing units – GPUs – at the same time), at an estimated cost of under $6m (£4.8m), compared to the more than $100m (£80m) that OpenAI boss Sam Altman says was required to train GPT-4.
Despite the hit taken to Nvidia’s market value, the DeepSeek models were trained on around 2,000 Nvidia H800 GPUs, according to one research paper released by the company. These chips are a modified version of the widely used H100 chip, built to comply with export rules to China. These were likely stockpiled before restrictions were further tightened by the Biden administration in October 2023, which effectively banned Nvidia from exporting the H800s to China. It is likely that, working within these constraints, DeepSeek has been forced to find innovative ways to make the most effective use of the resources it has at its disposal.
Reducing the computational cost of training and running models may also address concerns about the environmental impacts of AI. The data centres they run on have huge electricity and water demands, largely to keep the servers from overheating. While most technology companies do not disclose the carbon footprint involved in operating their models, a recent estimate puts ChatGPT’s monthly carbon dioxide emissions at over 260 tonnes per month – that’s the equivalent of 260 flights from London to New York. So, increasing the efficiency of AI models would be a positive direction for the industry from an environmental point of view.
Of course, whether DeepSeek’s models do deliver real-world savings in energy remains to be seen, and it’s also unclear if cheaper, more efficient AI could lead to more people using the model, and so an increase in overall energy consumption.
If nothing else, it could help to push sustainable AI up the agenda at the upcoming Paris AI Action Summit so that AI tools we use in the future are also kinder to the planet.
What has surprised many people is how quickly DeepSeek appeared on the scene with such a competitive large language model – the company was only founded by Liang Wenfeng in 2023, who is now being hailed in China as something of an “AI hero”.
The model is constructed from a group of much smaller models, each having expertise in specific domains
The latest DeepSeek model also stands out because its “weights” – the numerical parameters of the model obtained from the training process – have been openly released, along with a technical paper describing the model’s development process. This enables other groups to run the model on their own equipment and adapt it to other tasks.
This relative openness also means that researchers around the world are now able to peer beneath the model’s bonnet to find out what makes it tick, unlike OpenAI’s o1 and o3 which are effectively black boxes. But there are still some details missing, such as the datasets and code used to train the models, so groups of researchers are now trying to piece these together.
Not all of DeepSeek’s cost-cutting techniques are new either – some have been used in other LLMs. In 2023, Mistral AI openly released its Mixtral 8x7B model which was on par with the advanced models of the time. Mixtral and the DeepSeek models both leverage the “mixture of experts” technique, where the model is constructed from a group of much smaller models, each having expertise in specific domains. Given a task, the mixture model assigns it to the most qualified “expert”.
DeepSeek has even revealed its unsuccessful attempts at improving LLM reasoning through other technical approaches, such as Monte Carlo Tree Search, an approach long touted as a potential strategy to guide the reasoning process of an LLM. Researchers will be using this information to investigate how the model’s already impressive problem-solving capabilities can be even further enhanced – improvements that are likely to end up in the next generation of AI models.
So what does this all mean for the future of the AI industry?
DeepSeek is potentially demonstrating that you don’t need vast resources to build sophisticated AI models. My guess is that we’ll start to see highly capable AI models being developed with ever fewer resources, as companies figure out ways to make model training and operation more efficient.
Up until now, the AI landscape has been dominated by “Big Tech” companies in the US – Donald Trump has called the rise of DeepSeek “a wake-up call” for the US tech industry. But this development may not necessarily be bad news for the likes of Nvidia in the long term: as the financial and time cost of developing AI products reduces, businesses and governments will be able to adopt this technology more easily. That will in turn drive demand for new products, and the chips that power them – and so the cycle continues.
It seems likely that smaller companies such as DeepSeek will have a growing role to play in creating AI tools that have the potential to make our lives easier. It would be a mistake to underestimate that.