this post was submitted on 15 Dec 2025
8 points (78.6% liked)
Artificial Intelligence
1800 readers
1 users here now
Welcome to the AI Community!
Let's explore AI passionately, foster innovation, and learn together. Follow these guidelines for a vibrant and respectful community:
- Be kind and respectful.
- Share high-quality contributions.
- Stay on-topic.
- Enhance accessibility.
- Verify information.
- Encourage meaningful discussions.
You can access the AI Wiki at the following link: AI Wiki
Let's create a thriving AI community together!
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Assuming there is a huge chunk of data on the internet that the AIS have already sucked up, the rate of production of new data would be much slower than the generated stuff. Meaning, it's not a stretch to imagine the AIS spending more time verifying and rejecting an increasingly larger percentage of incoming data, while adding only a small chunk to the knowledge base. So: exponentially more power consumption for limited gains, classic diminishing returns conundrum?
You could say that's a problem Deepseek solved last year. One of their biggest insights was using a lot of AI compute to sift through the Whole Internnet for really really good initial training data (as opposed to generating it synthetically)
Yannic Kilcher did a great breakdown that includes details of this aspect: [GRPO Explained] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models