Entertainment
News from all around the entertainment industry. Less focused on celebrity side of things.
Main areas:
- TV Shows
- Movies
- Books
- Music
Rules
1. English only
Posts and comments has to be in English.
2. Use original link
Post URL should be the original link (even if paywalled) and archived copies left in the post body. It allows avoiding duplicate posts when cross-posting.
3. Respectful communication
All communication has to be respectful of differing opinions, viewpoints, and experiences.
4. Inclusivity
Everyone is welcome here regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, or sexual identity and orientation.
5. Off-topic tangents
Stay on topic. Keep it relevant.
6. Instance rules may apply
If something is not covered by community rules, but are against lemmy.zip instance rules, they will be enforced.
If someone is interested in moderating this community, message @brikox@lemmy.zip.
view the rest of the comments
If Disney rolls their own model (or finetune), that's not really an issue for them specifically? They have plenty of access to their own IP, stuff they already license, openly licensed data, and massive tooling for synthetic data generation.
...If they just wrap Sora, the irony would be tremendous, yeah. That's the absolute quickest and laziest thing to do and they could namedrop 'OpenAI' in earnings calls, so there's a good chance they'll do that.
In that case, yeah I fully agree, and that's an interesting argument. Like with Bezos' new AI initiative, Amazon would have an immense pool of their own data to pull from, and Disney certainly owns a hell of a lot of properties. I do think it's naive to assume that's what's going on, and Disney wouldn't be doing what every other multinational corporation engaging in AI training is doing, which is scraping any and all dataset they can get access to regardless of propriety since arguably ALL data is useful. Could be I'm just cynical but fastest, laziest profit turns out to be plan A in almost every case these days.
There are actually very few 'big' model trainers, or at least trainers worth anything.
OpenAI, Anthrophic, xAI, and Google (and formerly Meta) are the big names to investors. You have Mistral in the EU, LG in Korea, the 'Chinese Dragons' like Alibaba and Deepseek, a few enterprise niches like Palantir, Cohere, or AI21, Perplexity and such for search, and...
That's it, mostly?
The vast, vast majority of corporations don't even finetune. They just use APIs of others and say they're making 'AI.' And you do have a few niches pursuing, say, TTS or imagegen, but the training sets for that are much more specialized.
...And actually, a lot of research and 'new' LLMs largely mixes of public datasets (so no need to scrape), synthetically generated data, outputs of other LLMs and/or more specifically formatted stuff. Take this one, which uses 5.5T of completely synthetic tokens:
https://old.reddit.com/r/LocalLLaMA/comments/1p20zry/gigachat3702ba36bpreview/
That, and rumor on the street is the Chinese govt provides the Chinese trainers with a lot of data (since their outputs/quirks are so suspiciously similar).
Hence, 'scraping the internet' is not actually the trend folks think it is. On the contrary, Meta seems to have refuted the 'quantity over quality' data approach with how hard their Llama 4 models flopped vs. how well Deepseek did. It's not very efficient, traning models is generally not profitable, and its done less than you think.
Point I'm making, along with just dumping my thinking, is that Disney is a special case.
Their focus is narrow: they want to generate tiktok-style images/videos of their characters, and only their characters. Not code, not, long chats, not spam articles, just that. They have no financial incentive to 'scrape all the internet' beyond the excellent archives that already exist; the only temptation is the 'quick and dirty' solution of using Sora instead of properly making something themselves.
I appreciate the well thought out response.