this post was submitted on 19 Jul 2024
12 points (100.0% liked)

Technology

68689 readers
21 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

All our servers and company laptops went down at pretty much the same time. Laptops have been bootlooping to blue screen of death. It's all very exciting, personally, as someone not responsible for fixing it.

Apparently caused by a bad CrowdStrike update.

Edit: now being told we (who almost all generally work from home) need to come into the office Monday as they can only apply the fix in-person. We'll see if that changes over the weekend...

all 44 comments
sorted by: hot top controversial new old
[–] [email protected] 1 points 8 months ago (2 children)

The amount of servers running Windows out there is depressing to me

[–] [email protected] 1 points 8 months ago

I dunno, but doesn't like a quarter of the internet kinda run on Azure?

[–] [email protected] 0 points 8 months ago (1 children)

I've had my PC shut down for updates three times now, while using it as a Jellyfin server from another room. And I've only been using it for this purpose for six months or so.

I can't imagine running anything critical on it.

[–] [email protected] 1 points 8 months ago

Windows server, the OS, runs differently from desktop windows. So if you're using desktop windows and expecting it to run like a server, well, that's on you. However, I ran windows server 2016 and then 2019 for quite a few years just doing general homelab stuff and it is really a pain compared to Linux which I switched to on my server about a year ago. Server stuff is just way easier on Linux in my experience.

[–] [email protected] 1 points 8 months ago (4 children)

Reading into the updates some more... I'm starting to think this might just destroy CloudStrike as a company altogether. Between the mountain of lawsuits almost certainly incoming and the total destruction of any public trust in the company, I don't see how they survive this. Just absolutely catastrophic on all fronts.

[–] [email protected] 1 points 8 months ago (1 children)

If all the computers stuck in boot loop can't be recovered... yeah, that's a lot of cost for a lot of businesses. Add to that all the immediate impact of missed flights and who knows what happening at the hospitals. Nightmare scenario if you're responsible for it.

This sort of thing is exactly why you push updates to groups in stages, not to everything all at once.

[–] [email protected] 1 points 8 months ago (1 children)

Looks like the laptops are able to be recovered with a bit of finagling, so fortunately they haven't bricked everything.

And yeah staged updates or even just... some testing? Not sure how this one slipped through.

[–] [email protected] 1 points 8 months ago

Not sure how this one slipped through.

I'd bet my ass this was caused by terrible practices brought on by suits demanding more "efficient" releases.

"Why do we do so much testing before releases? Have we ever had any problems before? We're wasting so much time that I might not even be able to buy another yacht this year"

[–] [email protected] 1 points 8 months ago* (last edited 8 months ago)

Testing in production will do that

[–] [email protected] -1 points 8 months ago (2 children)

Don't we blame MS at least as much? How does MS let an update like this push through their Windows Update system? How does an application update make the whole OS unable to boot? Blue screens on Windows have been around for decades, why don't we have a better recovery system?

[–] [email protected] 2 points 8 months ago

Crowdstrike runs at ring 0, effectively as part of the kernel. Like a device driver. There are no safeguards at that level. Extreme testing and diligence is required, because these are the consequences for getting it wrong. This is entirely on crowdstrike.

[–] [email protected] 1 points 8 months ago* (last edited 8 months ago)

This didn't go through Windows Update. It went through the ctowdstrike software directly.

[–] [email protected] 1 points 8 months ago (2 children)

>Make a kernel-level antivirus
>Make it proprietary
>Don't test updates... for some reason??

[–] [email protected] 0 points 8 months ago (2 children)

I mean I know it's easy to be critical but this was my exact thought, how the hell didn't they catch this in testing?

[–] [email protected] 1 points 8 months ago

Completely justified reaction. A lot of the time tech companies and IT staff get shit for stuff that, in practice, can be really hard to detect before it happens. There are all kinds of issues that can arise in production that you just can't test for.

But this... This has no justification. A issue this immediate, this widespread, would have instantly been caught with even the most basic of testing. The fact that it wasn't raises massive questions about the safety and security of Crowdstrike's internal processes.

[–] [email protected] 0 points 8 months ago (1 children)

From what I've heard and to play a devil's advocate, it coincidented with Microsoft pushing out a security update at basically the same time, that caused the issue. So it's possible that they didn't have a way how to test it properly, because they didn't have the update at hand before it rolled out. So, the fault wasn't only in a bug in the CS driver, but in the driver interaction with the new win update - which they didn't have.

[–] [email protected] 1 points 8 months ago (1 children)

How sure are you about that? Microsoft very dependably releases updates on the second Tuesday of the month, and their release notes show if updates are pushed out of schedule. Their last update was on schedule, July 9th.

[–] [email protected] 2 points 8 months ago

I'm not. I vaguely remember seeing it in some posts and comments, and it would explain it pretty well, so I kind of took it as a likely outcome. In hindsight, You are right, I shouldnt have been spreading hearsay. Thanks for the wakeup call, honestly!

[–] [email protected] -1 points 8 months ago

Lots of security systems are kernel level (at least partially) this includes SELinux and AppArmor by the way. It's a necessity for these things to actually be effective.

[–] [email protected] 1 points 8 months ago

I see a lot of hate ITT on kernel-level EDRs, which I wouldn't say they deserve. Sure, for your own use, an AV is sufficient and you don't need an EDR, but they make a world of difference. I work in cybersecurity doing Red Teamings, so my job is mostly about bypassing such solutions and making malware/actions within the network that avoids being detected by it as much as possible, and ever since EDRs started getting popular, my job got several leagues harder.

The advantage of EDRs in comparison to AVs is that they can catch 0-days. AV will just look for signatures, a known pieces or snippets of malware code. EDR, on the other hand, looks for sequences of actions a process does, by scanning memory, logs and hooking syscalls. So, if for example you would make an entirely custom program that allocates memory as Read-Write-Execute, then load a crypto dll, unencrypt something into such memory, and then call a thread spawn syscall to spawn a thread on another process that runs it, and EDR would correlate such actions and get suspicious, while for regular AV, the code would probably look ok. Some EDRs even watch network packets and can catch suspicious communication, such as port scanning, large data extraction, or C2 communication.

Sure, in an ideal world, you would have users that never run malware, and network that is impenetrable. But you still get at avarage few % of people running random binaries that came from phishing attempts, or around 50% people that fall for vishing attacks in your company. Having an EDR increases your chances to avoid such attack almost exponentionally, and I would say that the advantage it gives to EDRs that they are kernel-level is well worth it.

I'm not defending CrowdStrike, they did mess up to the point where I bet that the amount of damages they caused worldwide is nowhere near the amount damages all cyberattacks they prevented would cause in total. But hating on kernel-level EDRs in general isn't warranted here.

Kernel-level anti-cheat, on the other hand, can go burn in hell, and I hope that something similar will eventually happen with one of them. Fuck kernel level anti-cheats.

[–] [email protected] 1 points 8 months ago* (last edited 8 months ago)

Honestly kind of excited for the company blogs to start spitting out their ~~disaster recovery~~ crisis management stories.

I mean - this is just a giant test of ~~disaster recovery~~ crisis management plans. And while there are absolutely real-world consequences to this, the fix almost seems scriptable.

If a company uses IPMI (~~Called~~ Branded AMT and sometimes vPro by Intel), and their network is intact/the devices are on their network, they ought to be able to remotely address this.
But that’s obviously predicated on them having already deployed/configured the tools.

[–] [email protected] 0 points 8 months ago (1 children)

Why do people run windows servers when Linux exists, it’s literally a no brainer.

[–] [email protected] 0 points 8 months ago (1 children)

Because all software runs from Linux right...

[–] [email protected] 2 points 8 months ago* (last edited 8 months ago)

It could if more people just used Linux

[–] [email protected] 0 points 8 months ago (1 children)

never do updates on a Friday.

[–] [email protected] 0 points 8 months ago (1 children)

crowdstrike sent a corrupt file with a software update for windows servers. this caused a blue screen of death on all the windows servers globally for crowdstrike clients causing that blue screen of death. even people in my company. luckily i shut off my computer at the end of the day and missed the update. It's not an OTA fix. they have to go into every data center and manually fix all the computer servers. some of these severs have encryption. I see a very big lawsuit coming...

[–] [email protected] 0 points 8 months ago* (last edited 8 months ago) (1 children)

. they have to go into every data center and manually fix all the computer servers

Do they not have IPMI/BMC for the servers? Usually you can access KVM over IP and remotely power-off/power-on/reboot servers without having to physically be there. KVM over IP shows the video output of the system so you can use it to enter the UEFI, boot in safe/recovery mode, etc.

I've got IPMI on my home server and I'm just some random guy on the internet, so I'd be surprised if a data center didn't.

[–] [email protected] 0 points 8 months ago (1 children)

Yeah my plans of going to sleep last night were thoroughly dashed as every single windows server across every datacenter I manage between two countries all cried out at the same time lmao

[–] [email protected] 0 points 8 months ago (1 children)

I always wondered who even used windows server given how marginal its marketshare is. Now i know from the news.

[–] [email protected] 0 points 8 months ago (1 children)

Marginal? You must be joking. A vast amount of servers run on Windows Server. Where I work alone we have several hundred and many companies have a similar setup. Statista put the Windows Server OS market share over 70% in 2019. While I find it hard to believe it would be that high, it does clearly indicate it's most certainly not a marginal percentage.

[–] [email protected] 0 points 8 months ago (1 children)

I'm not getting an account on Statista, and I agree that its marketshare isn't "marginal" in practice, but something is up with those figures, since overwhelmingly internet hosted services are on top of Linux. Internal servers may be a bit different, but "servers" I'd expect to count internet servers...

[–] [email protected] 1 points 8 months ago

Most servers aren't Internet-facing.

[–] [email protected] 0 points 8 months ago (1 children)

This is going to be a Big Deal for a whole lot of people. I don't know all the companies and industries that use Crowdstrike but I might guess it will result in airline delays, banking outages, and hospital computer systems failing. Hopefully nobody gets hurt because of it.

[–] [email protected] 0 points 8 months ago (1 children)

Big chunk of New Zealands banks apparently run it, cos 3 of the big ones can't do credit card transactions right now

[–] [email protected] -1 points 8 months ago

cos 3 of the big ones can’t do credit card transactions right now

Bitcoin still up and running perhaps people can use that

[–] [email protected] -1 points 8 months ago (1 children)

This is why you create restore points if using windows.

[–] [email protected] 0 points 8 months ago (1 children)

Those things never worked for me... Problems always persisted or it failed to apply the restore point. This is from the XP and Windows 7 days, never bothered with those again. To Microsoft's credit, both W7 and W10 were a lot more stable negating the need for it.

[–] [email protected] -1 points 8 months ago* (last edited 8 months ago)

I can't say about XP or 7 but they've definitely saved my bacon on Win10 before on my home system. And the company I work for has them automatically created and it made dealing with the problem much easier as there was a restore point right before the crowdstrike update. No messing around with the file system drivers needed.

I'd really recommend at least creating one at a state when your computer is working ok, it doesn't hurt anything even if it doesn't work for you for whatever reason. It's just important to understand that it's not a cure all, it's only designed to help with certain issues (primarily botched updates and file system trouble).

[–] [email protected] -1 points 8 months ago

play stupid games win stupid prizes