this post was submitted on 09 Apr 2026
939 points (99.1% liked)
Science Memes
19858 readers
3099 users here now
Welcome to c/science_memes @ Mander.xyz!
A place for majestic STEMLORD peacocking, as well as memes about the realities of working in a lab.

Rules
- Don't throw mud. Behave like an intellectual and remember the human.
- Keep it rooted (on topic).
- No spam.
- Infographics welcome, get schooled.
This is a science community. We use the Dawkins definition of meme.
Research Committee
Other Mander Communities
Science and Research
Biology and Life Sciences
- !abiogenesis@mander.xyz
- !animal-behavior@mander.xyz
- !anthropology@mander.xyz
- !arachnology@mander.xyz
- !balconygardening@slrpnk.net
- !biodiversity@mander.xyz
- !biology@mander.xyz
- !biophysics@mander.xyz
- !botany@mander.xyz
- !ecology@mander.xyz
- !entomology@mander.xyz
- !fermentation@mander.xyz
- !herpetology@mander.xyz
- !houseplants@mander.xyz
- !medicine@mander.xyz
- !microscopy@mander.xyz
- !mycology@mander.xyz
- !nudibranchs@mander.xyz
- !nutrition@mander.xyz
- !palaeoecology@mander.xyz
- !palaeontology@mander.xyz
- !photosynthesis@mander.xyz
- !plantid@mander.xyz
- !plants@mander.xyz
- !reptiles and amphibians@mander.xyz
Physical Sciences
- !astronomy@mander.xyz
- !chemistry@mander.xyz
- !earthscience@mander.xyz
- !geography@mander.xyz
- !geospatial@mander.xyz
- !nuclear@mander.xyz
- !physics@mander.xyz
- !quantum-computing@mander.xyz
- !spectroscopy@mander.xyz
Humanities and Social Sciences
Practical and Applied Sciences
- !exercise-and sports-science@mander.xyz
- !gardening@mander.xyz
- !self sufficiency@mander.xyz
- !soilscience@slrpnk.net
- !terrariums@mander.xyz
- !timelapse@mander.xyz
Memes
Miscellaneous
founded 3 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Ok, but who is making those "open weight" models though? Individuals don't really have the resources to run these huge scraping operations, so they're often still corporate releases with fake open source branding.
Corporate, for now.
Thing is, once they’re out there, they’re free utilities, and they can’t be taken back.
Also, they don’t really need to aggressively scrape the internet. There are many good public datasets now, and the Chinese are already making excellent use of synthetic dataset generation on (relative) shoestring budgets. Also, several nations and other large organizations are already funding open model efforts, but they just haven’t had the opportunity to catch up yet.
They come from corporate but you can at least run them without any kind of analytics or censorship, as well as fine tune them on consumer hardware.
Consumers aren't in the best position right now though, especially with the price hikes.
There are huge public datasets that are often used for pretraining. Common Crawl and C4 are probably the most prominent, but there are others.
There are also big public datasets available for fine-running and instruction tuning.
The open weight models are getting pretty powerful, thanks to some Chinese labs.