BridgeMe
Posts
DeepSeek’s AI: Gemini Training Speculation

DeepSeek’s AI: Gemini Training Speculation

June 04, 2025

Chinese AI lab DeepSeek recently released an updated reasoning model, R1-0528, showing strong results on math and coding tests. However, the company hasn’t revealed its training data sources, sparking speculation that it may have used outputs from Google’s Gemini AI.

Similarities to Google’s Gemini. Developer Sam Paech observed that R1-0528’s language closely matches Google’s Gemini 2.5 Pro. Another researcher noted that the model’s internal “traces”—the reasoning steps it generates—also resemble those of Gemini. While this isn’t definitive, it raises questions about the data used.

Past Concerns Over Data Use. Earlier versions of DeepSeek’s models sometimes identified as ChatGPT, suggesting potential training on OpenAI data. OpenAI has reported evidence linking DeepSeek to data extraction techniques called distillation, which are common but restricted by OpenAI’s terms of service.

Industry Response and Security. AI experts say overlapping language is common due to AI-generated content online, but some suspect DeepSeek may have created synthetic data from top models like Gemini to overcome limited compute power. In response, companies such as OpenAI and Google have introduced ID verification and trace summarization to protect their models from unauthorized use.