RoundSparrow

joined 3 years ago
MODERATOR OF
 

There was a bug in Lemmy where servers with large numbers of subscribers did a SQL query that overloaded the servers. Lemmy.ml and Lemmy.world were heavily impacted, given how many new instances have been added with people who subscribe to those big server communities!

From Lemmy.world:

I have been spot checking the comment replication on recent postings and messages are flowing far better.

 

lemmy.world announcing to public that this is installed on their production server: https://lemmy.world/post/1061471

1
submitted 2 years ago* (last edited 2 years ago) by RoundSparrow@lemmy.ml to c/newworld@lemmy.ml
 

I don't like Lemmy saying 1.51 years is 2 years ;)

 

linked to original post given replication problems with lemmy.ml

 

I spent several hours tracing in production (updating the code a dozen times with extra logging) to identify the actual path the lemmy_server code uses for outbound federation of votes to subscribed servers.

Major popular servers, Beehaw, Leemy.world, Lemmy.ml - have a large number of instance servers subscribing to their communities to get copies of every post/comment. Comment votes/likes are the most common activity, and it is proposed that during the PERFORMANCE CRISIS that outbound vote/like sharing be turned off by these overwhelmed servers.

pull request for draft:

https://github.com/LemmyNet/lemmy/compare/main...RocketDerp:lemmy_comment_votes_nofed1:no_federation_of_votes_outbound0

EDIT: LEMMY_SKIP_FEDERATE_VOTES environment variable

 

Grep the lemmy server code for "= LocalSite::read" - and I find that even for a single vote by an end-user, it is doing an SQL query to the local site settings to see if downvotes are disabled.

Can some Rust programmers chime in here? Can we cache this in RAM and not fetch from SQL every time?

PostgreSQL is telling me that the 2nd most run query on my system, which is receiving incoming federation post/comment/votes, is this:

SELECT "local_site"."id", "local_site"."site_id", "local_site"."site_setup", "local_site"."enable_downvotes", "local_site"."enable_nsfw", "local_site"."community_creation_admin_only", "local_site"."require_email_verification", "local_site"."application_question", "local_site"."private_instance", "local_site"."default_theme", "local_site"."default_post_listing_type", "local_site"."legal_information", "local_site"."hide_modlog_mod_names", "local_site"."application_email_admins", "local_site"."slur_filter_regex", "local_site"."actor_name_max_length", "local_site"."federation_enabled", "local_site"."captcha_enabled", "local_site"."captcha_difficulty", "local_site"."published", "local_site"."updated", "local_site"."registration_mode", "local_site"."reports_email_admins" FROM "local_site" LIMIT $1

 

" I believe that it is responsible for major increases in CPU and RAM usage and client errors. Because now there are up to millions of async tasks active which are doing nothing but sleeping, and this likely messes up the scheduler. I will rework this for 0.18.2."

 

Drastic scaling performance times call for drastic solutions.

view more: ‹ prev next ›