this post was submitted on 08 Feb 2026
1375 points (98.7% liked)

Fuck AI

5702 readers
923 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

AI, in this case, refers to LLMs, GPT technology, and anything listed as "AI" meant to increase market valuations.

founded 2 years ago
MODERATORS
 
you are viewing a single comment's thread
view the rest of the comments
[–] yucandu@lemmy.world 1 points 17 hours ago (1 children)

It is not trained on open source code on Github.

But I can use it to analyze a datasheet and generate a library for an obscure module that I can then upload to Github and contribute to the community.

[–] jaredwhite@humansare.social 1 points 14 hours ago (1 children)

Apertus is most certainly trained on source code hosted on GitHub. It is laid out here in their technical report:

https://github.com/swiss-ai/apertus-tech-report

It uses a large dataset called TheStack, among others.

[–] yucandu@lemmy.world 1 points 3 hours ago* (last edited 3 hours ago) (1 children)

StarCoderData.23 A large-scale code dataset derived from the permissively licensed GitHub collection The Stack (v1.2). (Kocetkov et al., 2022), which applies deduplication and filtering of opted-out files. In addition to source code, the dataset includes supplementary resources such as GitHub Issues and Jupyter Notebooks (Li et al., 2023).

That's not random Github accounts or "delicensing" anything. People had to opt IN to be part of "The Stack". Apertus isn't training itself from community code.

[–] jaredwhite@humansare.social 1 points 3 hours ago

I'm tired of arguing with you about this, and you're still wrong. It was opt-out, not opt-in, based initially on a GitHub crawl of 137M repos and 52B files before filtering & dedup.