kbin.earth meta

0 readers

5 users here now

A magazine dedicated to the kbin.earth Mbin instance. Official announcements and maintenance updates are made here, but also feel free to ask questions or request support.

If you'd like, you can also join the kbin.earth Matrix chat for real-time discussion of support, moderation decisions, or just general chatting.

founded 2 years ago

MODERATORS

jwr1@kbin.earth

Addressing kbin.earth stability issues (kbin.earth)

submitted 3 days ago by jwr1@kbin.earth to c/kbinEarth@kbin.earth

10 comments fedilink hide all child comments

As I'm sure y'all have noticed, the server has been having stability issues the past week. Every once in a while, the server seems to "crash"; it will time out on all network requests for about 5 minutes, happening several times within an hour. Additionally, there have been several times over the past week when the server queues got backed up, resulting in incoming and outgoing federation traffic being delayed.

At first, I assumed the server hardware had simply become insufficient to handle all of the network and federation traffic we have been receiving, potentially due to a DDOS attack. I was honestly starting to think I would need to upgrade the server hardware, which is ridiculous, as the VPS is already quite powerful and is certainly expensive enough.

Luckily, @green_copper helped with debugging and noticed the server timeouts only occur when the microblog section is viewed (and sometimes when the combined section is viewed). So rather than a lack of hardware being the issue, I believe it actually has to do with the recent Mbin upgrade. My guess is that an SQL query was changed in one of the commits between Mbin v1.9.1 and v1.10.0, which resulted in a widely inefficient query, causing the server to freeze up and time out on requests.

I have and will be investigating the issue this week. I will start by analyzing which database queries are causing the timeouts, but if needed, roll back to Mbin v1.9.1 and try to figure out which specific Mbin commit introduced the bug. Additionally, I may have access restricted only to signed-in users while I'm working on this.

In the meantime, if you guys can refrain from viewing microblogs (or combined view), that would keep the server accessible for everyone and help me work on it faster, as I can't do anything while the server is frozen. If you want any realtime updates on the server issue (or anything else related to kbin.earth), feel free to join the Matrix room.

Thanks for sticking around and for your patience!

top 10 comments

sorted by: hot top controversial new old

[–] celeste@kbin.earth 5 points 2 days ago

Went in and switched my default view off combined. Good luck! This sounds like a complicated issue, but I hope it actually turns out to be simple

[–] green_copper@kbin.earth 15 points 3 days ago (1 children)

Thanks for sticking around and for your patience!

And thank you for dedicating your time and money providing us with a nice corner of the internet ^^

[–] Maeve@kbin.earth 8 points 3 days ago

Absolutely. We do appreciate you.

[–] Skua@kbin.earth 6 points 2 days ago

While attempting to find where to change my default home page view off of the combined view, I mis-clicked and pressed the home button and then the server crashed. So I apologise for being a moron. Thank you for all your work with the server

[–] Oofnik@kbin.earth 5 points 2 days ago

You rock @jwr1@kbin.earth

[–] HarkMahlberg@kbin.earth 6 points 3 days ago (1 children)

Good luck! If it's truly a SQL issue, it seems you've got your work cut out for you. There's some QB query builders, conditional string concatenation, big preformatted blocks of nested subqueries, the works. Will likely be hard to test.

I do wonder if it has to do with the move to cursors for pagination, there's quite a number of changes related to that.

https://github.com/MbinOrg/mbin/pull/1985 https://github.com/MbinOrg/mbin/pull/2041

[–] HarkMahlberg@kbin.earth 2 points 3 days ago

Oh, perhaps the included monitoring system will help?

https://github.com/MbinOrg/mbin/pull/1960

[–] Maeve@kbin.earth 3 points 3 days ago

The prime24/nordpublications jackasses have switched tactics and are creating accounts still without posting their vacuous rubbish. Watch new accounts, in not saying it's them but they're highly suspect, imo.

[–] Auster@thebrainbin.org 3 points 3 days ago (1 children)

@BentiGorlich@gehirneimer.de, maybe related to the RSS and php-fpm process situations at The/Brain/Bin?

[–] BentiGorlich@gehirneimer.de 2 points 14 hours ago

that was actually mostly a pg-bouncer. I have no idea why it is happening though...

I just have a script that checks for this line and restarts php-fpm if necessary:

#!/bin/bash

lineCount=`tail -n 1 /var/log/postgresql/pgbouncer.log | grep query_wait_timeout | wc -l`

if [ $lineCount -gt 0 ]; then
  systemctl restart php8.3-fpm.service
  logger "Restarting php because pgbouncer has query_wait_timeout messages"
else
  echo "No"
fi