this post was submitted on 10 Apr 2025
36 points (95.0% liked)

Fediverse

32588 readers
480 users here now

A community to talk about the Fediverse and all it's related services using ActivityPub (Mastodon, Lemmy, KBin, etc).

If you wanted to get help with moderating your own community then head over to [email protected]!

Rules

Learn more at these websites: Join The Fediverse Wiki, Fediverse.info, Wikipedia Page, The Federation Info (Stats), FediDB (Stats), Sub Rehab (Reddit Migration)

founded 2 years ago
MODERATORS
 

cross-posted from: https://lemmy.world/post/28011368

So I started by doing research and by research I mean watching two videos on YouTube about basic recommendation algorithms.

I did watch a 30 minute video on Netflix software engineer talking about using machine learning and complex matrix and these bandit style machine learning algorithms to recommend TV shows/movies really the base conclusion is that there's a 50% improve over doing all these complex things over their baseline measurement. Baseline will mean traditional pre neral network based algorithms.

The way I interpret it is that basics take you a long way and all the basics are is just organizing any peertube video into a vector and people watching into a vector as well. The idea would be that which videos are more similar to each other would be good recommendations if a watcher watch one of those videos, or if they didn't like it don't recommend any videos similar to that. Once these videos get vectorized then the watcher's vector can be updated in a basic way more watch time mean its more of what they want and a like would give it a boost, or comment could boost multiplier.

I'd say that the watcher's vector can be stored locally while videos vector is public. It will be a while to figure out a function/algorithm to adapt to watcher. Does the watcher taste change, do they multiple things , should the algorithm adapt fast or slow as new videos come in, novelty/consistency. I don't expect this problem to be solved anytime soon , but the recommendation algorithm will simply evolve and split as to have their own unique benefits and drawbacks.

To start foundation is to start a standard for video vector. Video can be quantified and qualified. There's only a few measurable quantities like video length and existing views. Qualitative attribute of videos like "is it a cooking tutorial, "is it a sports commentary ", or "is it a Livestream VOD" are going to require that the vector be stored in a format that can adapt to the expanding number of dimensions the quality a peertube video can have. Next issue is measure qualities to an actual number is something sports related or sports adjacent would a 1 mean yes or would a 0 mean neutral/agnostic or no.

The last simplist issue would be communicated the algorithm that updates the watcher's vector since that can be done via updates from peertube server or GitHub

top 12 comments
sorted by: hot top controversial new old
[–] [email protected] 1 points 1 day ago* (last edited 1 day ago)

GL! We need one(or a better one).

Right now, the best algo is going to [email protected] (or its sister communities) and trying to find videos/creators you like, then subscribing

OR if your on piefed, going to https://piefed.social/f/fediversevideos with a huge amount of creators.

[–] [email protected] 3 points 2 days ago (1 children)

We shouldn't fear these algorithms, the key though is everything has to be under the control of the user. A simple way is just tags, which are widely used almost everywhere. If the user has their panel and can block tags or words or give a rating from +10 to -10 for any keyword or tag, you can accomplish a hell of a lot with no black box fuckery.

[–] [email protected] 2 points 2 days ago (1 children)

I like that idea for a stupid simple algorithm. ironically I plan for there to be like a Varity of algorithms both that are user only and a aggregate. really im trying to pin down a standardized video vector that can describe any video to any level of detail

[–] [email protected] 1 points 1 day ago (1 children)

Another thing I think is fine is "because you watched ____" and suggestions based on that. Like we see a video that the user watched the whole video or liked it, and then we find similar videos to that I don't think its that invasive and may be helpful.

As long as its transparent and the user has control to use it or not I think thats the key. The fediverse however is enormously privacy and corporate sensitive, so I can see that it is a tight line to walk here.

[–] [email protected] 2 points 1 day ago

really im thinking about what data is okay to share and what data should be kept to the user. basically I determined that description of the video is only thing that can be public and the people/bot describing it okay to share (like associating their channel to a description they make to specific video) and the watchers device can collect video meta data to find suggestions

[–] [email protected] 2 points 2 days ago (1 children)

Looks cool. It's pretty similar to how I thought of doing this as well. Do you have a prototype? You might be able to adjust the algorithm if you receive feedback from some PeerTube veterans.

[–] [email protected] 3 points 2 days ago (1 children)

i havent made anything yet i just wanted to articulate that a basic algorythm can be done ethically where either instance/watcher/fediverse in general can make a vector to define a video and that could be shared via activity hub and the user can have a vector for themselves and even their own algorithm to sift through videos.

im just starting and right now i have to figure out how to format the video vector do i want .json .csv .xml

[–] [email protected] 1 points 2 days ago* (last edited 2 days ago) (1 children)

Why would the video vectors be stored and calculated on the server though, let alone be federated? Let alone stored on the instance? These things can be calculated instantly on the device.

[–] [email protected] 1 points 2 days ago

be better to store the video vector on an instance so that watchers can retrieve, just logistics. video vector (element) can be calculated anywhere just communicated to an instance, the idea is to be flexible. activityhub protocol has made the decisions easy the video vector has to be a .json element in a video json data.

it would be better to store the results of a calculation to avoid repeated calculations. im looking into music classifications, and like the entire video can be sent to parse to see if its music or not, the tempo, genra, id assume that would be fairly costly to calculate or instance can send the video vector that states all that information

[–] [email protected] 1 points 2 days ago (1 children)

What would these deltas look like? It'd be hard to anonymise and protect from abuse?

Personally I'd be happy to share my likes and watch times, but I know some people worry about that.

[–] [email protected] 1 points 2 days ago

im not at the aggregating data stage but you can just put a random id on a data set

[–] [email protected] 1 points 2 days ago

I'm not technical enough to know what the details mean but Im excited of the idea of a simple algorithm for peertube