cross-posted from: https://lemmy.world/post/28011368
So I started by doing research and by research I mean watching two videos on YouTube about basic recommendation algorithms.
I did watch a 30 minute video on Netflix software engineer talking about using machine learning and complex matrix and these bandit style machine learning algorithms to recommend TV shows/movies really the base conclusion is that there's a 50% improve over doing all these complex things over their baseline measurement. Baseline will mean traditional pre neral network based algorithms.
The way I interpret it is that basics take you a long way and all the basics are is just organizing any peertube video into a vector and people watching into a vector as well. The idea would be that which videos are more similar to each other would be good recommendations if a watcher watch one of those videos, or if they didn't like it don't recommend any videos similar to that. Once these videos get vectorized then the watcher's vector can be updated in a basic way more watch time mean its more of what they want and a like would give it a boost, or comment could boost multiplier.
I'd say that the watcher's vector can be stored locally while videos vector is public. It will be a while to figure out a function/algorithm to adapt to watcher. Does the watcher taste change, do they multiple things , should the algorithm adapt fast or slow as new videos come in, novelty/consistency. I don't expect this problem to be solved anytime soon , but the recommendation algorithm will simply evolve and split as to have their own unique benefits and drawbacks.
To start foundation is to start a standard for video vector. Video can be quantified and qualified. There's only a few measurable quantities like video length and existing views. Qualitative attribute of videos like "is it a cooking tutorial, "is it a sports commentary ", or "is it a Livestream VOD" are going to require that the vector be stored in a format that can adapt to the expanding number of dimensions the quality a peertube video can have. Next issue is measure qualities to an actual number is something sports related or sports adjacent would a 1 mean yes or would a 0 mean neutral/agnostic or no.
The last simplist issue would be communicated the algorithm that updates the watcher's vector since that can be done via updates from peertube server or GitHub
Another thing I think is fine is "because you watched ____" and suggestions based on that. Like we see a video that the user watched the whole video or liked it, and then we find similar videos to that I don't think its that invasive and may be helpful.
As long as its transparent and the user has control to use it or not I think thats the key. The fediverse however is enormously privacy and corporate sensitive, so I can see that it is a tight line to walk here.
really im thinking about what data is okay to share and what data should be kept to the user. basically I determined that description of the video is only thing that can be public and the people/bot describing it okay to share (like associating their channel to a description they make to specific video) and the watchers device can collect video meta data to find suggestions