this post was submitted on 10 Apr 2025
182 points (98.4% liked)

Home

555 readers
2 users here now

Lemmy.zip instance discussion.

For all things relating to Lemmy.zip.

Main instance rules apply, with the additional rules below:

founded 2 years ago
MODERATORS
 

Hi All,

As some of you may have realised, the planned upgrade sort of crashed everything, and we had our longest period of downtime since the site began.

This is partly because I had to go to sleep (thanks to a newborn and a job).

The good news is that the backup process worked! We've restored to seconds before the upgrade took the site offline.

The bad news is that federation is likely to be.. wonky.. for a little while. The site may also go up and down while I undo some of the fixes I tried.

Ultimately the issue came down to the upgrade failing (I am not sure why - will be digging into this now the priority is no longer getting the site up) and then the containers not talking to eachother, so the UI wouldn't talk to lemmy, and lemmy wouldn't talk to the database.

I rebuilt the containers, restored the backup, restarted everything, and it's all come back up (admittedly not perfect right now).

Importantly, I want to issue an apology. This isn't what I want for Lemmy.zip, and it should've been handled way better by myself. I'm always learning but this took way longer than it should've, and while I take some solace in the fact the backup process worked and has been proven to work in production, the delay in being able to get this back up is entirely my fault and frankly unacceptable.

I'll be working to document this outage, the steps it took to get it back up, and some form of repeatable plan so a repair can be replicated in the future if I'm not available.

In terms of upgrading to 0.19.11 - I will have to try again soon as it's got some security fixes we desperately need to implement.

Thanks

Demigodrick

top 50 comments
sorted by: hot top controversial new old
[–] [email protected] 57 points 2 weeks ago (1 children)

Importantly, I want to issue an apology

Way I see it, family and mental health always comes before internet randos. Thanks for working hard for everyone.

[–] [email protected] 11 points 2 weeks ago

Lots of Internet randos have been very nice and supportive, so I feel a debt to the community to make this place the best it can be.

But thank you ❤️

[–] [email protected] 46 points 2 weeks ago (1 children)

I will try and reply to each comment - but you've all been really kind and that means so much ❤️

If you're interested, this graph will show you how far behind we are. We should eventually catch up, but things will likely be very delayed for up to 12 hours.

The status page did not work as expected - and I'll try and link a few more places where I post updates. If you haven't yet, definitely join the matrix space and you'll get minute by minute panic updates 🫠

[–] [email protected] 7 points 2 weeks ago (1 children)

That graph is really kind of neat, but it seems to only be synchronizing with a single instance at a time from what I can tell. I saw the world line has dropped significantly, but the other lines don't look like they've fallen yet.

[–] [email protected] 12 points 2 weeks ago (1 children)

Yes, the lemmy.world admins kindly manually reset the timer for their instance so it started updating straight away!

If an instance goes down, other instances slowly back off sending retries of activities so not to waste sending them to dead instances.

You can use this tool to see this info. It links lemmy.world but you can search for any instance, and then look up lemmy.zip either under failed or lagging instances. You'll see on the far right the "next send try" time and date. Looks like a lot will try again around 9pm (although I'm not entirely sure on the timezone there) - so over the next few hours instances will send another try, see that lemmy.zip is back up, and then start federation with us again :)

[–] [email protected] 5 points 2 weeks ago

It's cool to see that there's logic built-in that keeps instances from sending Federation requests to dead instances. But when an instance comes back online, they will re-synchronize themselves. An instance may drop out of the Federation, but when it comes back, it will get everything it missed. Eventually.

[–] [email protected] 40 points 2 weeks ago* (last edited 2 weeks ago) (3 children)

I've been there. But it is my honor to bestow upon you this award to commemorate the accomplishment

[–] [email protected] 19 points 2 weeks ago (1 children)

Ah yes. I still wear my 25 year old “deleted a prod database” badge with honor

[–] [email protected] 17 points 2 weeks ago

It's a bittersweet honour to have. My personal fail was being too cocky updating a 'handful' of product descriptions.

(15398 rows(s) affected)

load more comments (2 replies)
[–] [email protected] 28 points 2 weeks ago (1 children)

Thanks for all your hard work. We missed .zip while it was gone.

load more comments (1 replies)
[–] [email protected] 27 points 2 weeks ago* (last edited 2 weeks ago) (5 children)

Thanks for the update.

I was a bit worried for your mental health as the hours of downtime continued :)

Awesome that the backup restore procedure work that well.

One thing I have been wondering is, why status.lemmy.zip stayed all green during all of this.

[–] [email protected] 8 points 2 weeks ago (2 children)

Because it was technically working, it's just that "UI wouldn't talk to lemmy, and lemmy wouldn't talk to the database". Soo they were operating, but not communicating to each other.

load more comments (2 replies)
load more comments (4 replies)
[–] [email protected] 17 points 2 weeks ago (1 children)

entirely my fault and frankly unacceptable.

You're providing a service out of your own time, pocket and energy; you don't owe anyone.
It's the other way around, we owe you.

So thank you.
Learn from your mistakes and carry on. 👍

load more comments (1 replies)
[–] [email protected] 16 points 2 weeks ago

Thank you for this post. Don't be so harsh on yourself, everyone can make a mistake!

Good to see Lemmy.zip back up!

[–] [email protected] 16 points 2 weeks ago* (last edited 2 weeks ago) (1 children)

Dude, you're being wayyyy to harsh on yourself!
You run this awesome instance for free while caring for a newborn, you don't owe anybody nothing.
Forget the delay, forget apologies and "unacceptable". Real life comes before social media, don't beat yourself up for the outage.

People who can't stand downtime should practice personal redundancy by creating backup accounts on other instances ;)

load more comments (1 replies)
[–] [email protected] 15 points 2 weeks ago (2 children)

Dude, keeping this running with a job and a newborn? You're headed for sainthood.

If you don't have one, you could start an out of band chat during updates, just in case you need some eyes on things or just some moral support. I'm sure we have at least a few subject matter experts around if you can stand us :)

[–] [email protected] 11 points 2 weeks ago
load more comments (1 replies)
[–] [email protected] 13 points 2 weeks ago* (last edited 2 weeks ago) (1 children)

No worries! Make sure you're getting enough rest!

load more comments (1 replies)
[–] [email protected] 13 points 2 weeks ago (1 children)

No need to apologize as you have been doing a stellar job. Your family needs to always take priority no matter what. I don't care if it is down for a week as your health and kid are far more important.

One thing I will say is that I think Lemmy.zip could really benefit from a external way of communicating announcements. It doesn't need to be complicated and you could reuse your existing mastodon account to post updates when things go wrong. It also could allow for users to give advise on how to fix issues.

load more comments (1 replies)
[–] [email protected] 13 points 2 weeks ago (1 children)

Things gotta fuck up sometimes, tis how we figure shit out and learn things! You got this.

load more comments (1 replies)
[–] [email protected] 12 points 2 weeks ago (1 children)

As a new parent myself, I'm stunned you managed to find the time to restore it at all. Good on ya, fella!

load more comments (1 replies)
[–] [email protected] 11 points 2 weeks ago (1 children)

Appreciate the honesty and transparency. Thank you for your hard work maintaining the site, and hopefully you're able to restore everything to a fully working state!

load more comments (1 replies)
[–] [email protected] 11 points 2 weeks ago (1 children)

Thank you for all your hard work, as an IT guy I know the feeling when production doesn't work as it should, and the feeling of relief when the backups are actually being restored and working.

Take care and make sure to take a break if you need to, we'll still be here.

load more comments (1 replies)
[–] [email protected] 11 points 2 weeks ago (1 children)

I appreciate the transparency and frankly couldn't ask for more. Shit happens and this is a one-person operation. Thanks for all your effort!

load more comments (1 replies)
[–] [email protected] 11 points 2 weeks ago (1 children)

thanks for the effort and also explanation

load more comments (1 replies)
[–] [email protected] 11 points 2 weeks ago (2 children)

Thanks for the update! I figured it must have had something to do with the baby and the busy life + the update not working as expected, so I was patient.

After 12hrs or so I did go on mastodon to look for an update (just a 'everything crashed, working on the backup' kinda message) so if this ever happens again that might be an idea?

Thanks for working so hard on getting everything back up and don't forget to rest!

load more comments (1 replies)
[–] [email protected] 10 points 2 weeks ago (1 children)

Don't worry. Newborn is a trump card, but even without it you literally are a volunteer.

Anyone complaining about you volunteering your time esp with a newborn, is not a parent..but ignoring that you're doing this for free. Thanks for your time, effort, and just happy it's back up :).

Social media that can go down for a day or two is way better than a shit hole of advertising and manipulation that is Facebook, reddit, and all the rest.

load more comments (1 replies)
[–] [email protected] 10 points 2 weeks ago (1 children)

Thank you for the transparent announcement, and don't sweat it!

load more comments (1 replies)
[–] [email protected] 10 points 2 weeks ago (1 children)

Hard agree with all of the other comments here. No apologies needed, you do a great job of keeping this instance going and the transparency is appreciated.

I temporarily switched over to an alt account and was back browsing Lemmy after figuring out .zip was offline, absolutely no big deal.

load more comments (1 replies)
[–] [email protected] 10 points 2 weeks ago (1 children)

Hey man, you've got absolutely nothing to worry about. The fact that you have this service for us at all is quite frankly amazing and we thank you for it. As another commenter said below, I'd rather have a day worth of downtime than to be on big corporate social media and have everything fixed quicker. Because I know that I'm not the product here. When it did not come back, I checked the status page and it said it was working. So I just figured something broke and decided I'd wait until it came back.

Actually... disregard everything I said above. I'm so fucking mad right now. I could bite holes in bricks. I mean, how dare you notice that there's a problem and not get it fixed absolutely immediately. /s

load more comments (1 replies)
[–] [email protected] 10 points 2 weeks ago (1 children)

Hey you’ve done a ton for all of us and I can’t thank you enough for the work and dedication. Don’t be too hard on yourself, your child and well being are both important, it’s fine. I’d rather some downtime than losing you as admin. Pretty sure most on the sever would agree.

load more comments (1 replies)
[–] [email protected] 10 points 2 weeks ago (1 children)

We appreciate you very much! Take all the rest you need. 🫡

load more comments (1 replies)
[–] [email protected] 9 points 2 weeks ago (1 children)

Don't fret about it, things happen. You run a great service for us. A job and a new family already add up to being more than two full-time obligations. Managing Zip along with all that is a lot. Thanks for doing it.

load more comments (1 replies)
[–] [email protected] 8 points 2 weeks ago (1 children)

How dare you interrupt my ability to look at memes and see the same news article posted in 17 places at once!

Jokes aside I appreciate the work y'all do to keep this sorta thing running without any pay or thanks for the most part.

I am greatful.

load more comments (1 replies)
[–] [email protected] 8 points 2 weeks ago (1 children)

I didn't realize it was a single person keeping it all running! Tech sometimes goes wonky, good job getting it back online!

load more comments (1 replies)
[–] [email protected] 8 points 2 weeks ago (1 children)

Trial by fire. At least it was interesting(!)

Praise be to the backup strategy 🙂

load more comments (1 replies)
[–] [email protected] 8 points 2 weeks ago (1 children)

Thanks for your hard work! Remember your mental health always has priority though. Cheers mate.

load more comments (1 replies)
[–] [email protected] 8 points 2 weeks ago (1 children)

I think you handled it very well. Not sure how it could’ve been handled better tbh. I figured something didn’t go as planned and I didn’t have any problems waiting for you to find a solution. No apologies needed.

load more comments (1 replies)
[–] [email protected] 8 points 2 weeks ago
[–] [email protected] 7 points 2 weeks ago

Been there, done that, with my Friendica instance. 2 days of downtime while rebuilding a corrupted database, while people are tapping their feet waiting for all to return. I'm with you in spirit, my friend.

Thanks for all your hard work keeping the dream alive! And for keeping good backups

[–] [email protected] 6 points 2 weeks ago* (last edited 2 weeks ago) (3 children)

Thanks for the hard work! Glad the server is back online.

A suggestion: Post a message on status.lemmy.zip when there is maintenance. That was where I thought to check when I found that the main site was not working. Though, it was reporting the site was fine when it was unavailable, this time.

Oh, and congratulations on the newborn!

load more comments (3 replies)
[–] [email protected] 6 points 2 weeks ago

You're good bro. Sort of assumed something went wrong with the upgrade.

[–] [email protected] 6 points 2 weeks ago* (last edited 2 weeks ago) (1 children)

No worries. You’re doing a great job even though things are hard from time to time.

Thanks for your efforts. ❤️

load more comments (1 replies)
[–] [email protected] 6 points 2 weeks ago (1 children)

Thank you for this post. Don't be too harsh on yourself, everyone can make a mistake!

Glad to have lemmy.zip up and running again!

load more comments (1 replies)
[–] [email protected] 5 points 2 weeks ago

Unfortunately these things can and do happen. I'm glad you were able to get things functional with a restoration. Best of luck troubleshooting and repairing the leftover gremlins.

Thanks for all you do to support Lemmy.

[–] [email protected] 5 points 2 weeks ago

Congratulations on the baby. We should thank you for making us go touch grass.

load more comments
view more: next ›