DeepSeek’s R1-0528 update reduced hallucinations by 45–50% and now rivals Gemini 2.5 Pro in reasoning tasks.
Developers claim R1-0528 mimics Gemini’s expressions and “traces,” hinting at potential unauthorized data use.
Microsoft detected large-scale data exfiltration from OpenAI accounts linked to DeepSeek in late 2024.
DeepSeek, the famous Chinese AI startup, has shaken the global tech stage once again. Last week, it released an updated version of its R1 reasoning model called R1-0528. This model has impressed many with strong results in math and coding benchmarks. However, critics are raising questions on whether DeepSeek’s new AI was trained on data from Google’s Gemini?
While DeepSeek has not shared the sources behind the training data, experts are picking up on clues. The similarities between R1-0528 and Gemini 2.5 Pro are difficult to ignore. Some developers believe DeepSeek may have used outputs from Google’s AI to improve its own.
Melbourne-based developer Sam Paech analyzed R1-0528 closely. He found that the model uses expressions and words often seen in Gemini’s responses. He shared his findings on social media, pointing out that the two models appear to think in similar ways.
Another developer, known online for building SpeechMap, also noticed something unusual. According to them, the way DeepSeek’s model ‘thinks’ as it solves problems feels very similar to Gemini’s reasoning paths. These ‘traces’ are the internal steps a model takes to reach an answer. While these claims are not hard proof, they have sparked real concern in the AI research community.
This isn’t the first time DeepSeek has faced questions about its training data. Last December, users reported that the older model, V3, sometimes identified itself as ChatGPT. This led to suspicions that DeepSeek may have used the latter’s responses to train its own model.
In early 2025, OpenAI told the Financial Times it had found evidence linking DeepSeek to model distillation. This is a method where one AI model is trained using outputs from another, more powerful one.
Bloomberg later reported that Microsoft had tracked large amounts of data being pulled from OpenAI developer accounts. Those accounts were believed to be connected to the Chinese AI model. Although distillation is common, OpenAI's terms of service forbid using its data to create competing AI models.
Also Read: Google AI Mode vs. ChatGPT Web Search: Know the Difference
Proving model copying is getting harder. That’s because the internet is now filled with AI-generated content. Many AI systems are trained on the open web, where content farms and bots are posting AI-written material. As a result, many models start to sound alike.
Nathan Lambert, a researcher from the nonprofit AI2, says it’s possible DeepSeek used Gemini-style outputs. “If I were DeepSeek,” he wrote on X, “I’d generate tons of synthetic data from the best API model.” He explained that DeepSeek may not have enough GPUs, but it likely has enough money to buy access to strong models.
To stop distillation, companies like OpenAI and Google are tightening access. OpenAI now requires ID verification to use advanced models. Users must upload government IDs, but China is not on the approved list.
Google, on the other hand, has begun summarizing model traces. This makes it harder for others to learn from them. Anthropic is doing the same to protect its Claude models. These moves are meant to stop rivals from copying performance by studying reasoning steps.
Even if DeepSeek was trained on Gemini-style outputs, it still fails where it matters most, true reasoning. A new Apple paper tested DeepSeek-R1 and others on complex logic puzzles. All LLM models, including DeepSeek, broke down under pressure. Thus, proving that mimicking powerful AIs doesn't guarantee real understanding or advanced problem-solving ability.
DeepSeek has not confirmed using data from Gemini. But many signs suggest the possibility. Whether it is creative engineering or clever borrowing, the results are impressive and controversial.
The global AI race is heating up. If companies keep training on each other’s models, lines between fair play and foul could disappear. DeepSeek’s rise is a wake-up call. The future of AI may depend just as much on ethics as it does on innovation.
Also Read: DeepSeek Chat vs. R1: Which One’s Safer and Smarter?