True
1050;
Score | 64
Albert David Bangura Graduate Teaching and Research Assistant @ Bahcesehir Cyprus University
city Nicosia, Cyprus
236
509
9
11
In Technology 1 min read
Why Data is the new Gold?
<p>Recently, China released a groundbreaking large language model or chatbot called DeepSeek-R1. According to the paper they published, this model is an improvement of DeepSeek-zero, which was purely trained using reinforcement learning. </p><p>Reinforcement learning is a type of machine learning where a model learns by interacting with the environment without using prior data to train it. However, DeepSeek researchers noted that DeepSeek-zero grappled with performance challenges due to the fact that it was only trained with pure reinforcement learning. </p><p>To improve DeepSeek-zero, they developed a new training pipeline, which gave birth to DeepSeek-R1, with much improved performance. This involved collecting thousands of cold start data as the starting point for reinforcement learning. </p><p>Their aim was to explore the effect of incorporating a small amount of high-quality data as a cold start on model reasoning performance. DeepSeek researchers stated that the cold start data contributed to the boost in performance of DeepSeek-R1, which backs the statement that โ€œ๐™™๐™–๐™ฉ๐™– ๐™ž๐™จ ๐™ฉ๐™๐™š ๐™ฃ๐™š๐™ฌ ๐™œ๐™ค๐™ก๐™™.โ€ </p><p><br></p><p> </p><p> </p><p> </p>

Other insights from Albert David Bangura

Insights for you.
What is TwoCents? ×