We'd like to remind Forumites to please avoid political debate on the Forum. This is to keep it a safe and useful space for MoneySaving discussions. Threads that are - or become - political in nature may be removed in line with the Forum’s rules. Thank you for your understanding.

massive IT outage hits the world

24

Comments

  • PHK
    PHK Posts: 1,769 Forumite
    Photogenic First Anniversary First Post Name Dropper
    prowla said:
    The mistake is in considering a PC software supplier to be a global enterprise service provider.
    Their cloud offering, "Azure", had an outage a few years back, where an Active Directory update propagated globally; it was almost like one of those disaster movies where you see a map of the worlds and the lights gradually go out.
    This time it looks like they embedded a 3rd party piece into their services such that it comprised a Single Point Of Failure (SPOF) across their entire systems, so when it had a bug it went everywhere.
    The days of having controlled roll-outs, QA testing, contingency plans, redundant services, resilient systems, and so-on are long gone.
    Some might say that replication is a resiliency feature, but the risk is that a service which relies on replication sufers from the pitfal that it will also replicate errors.
    As for customers, Microsoft showed that 80% of the product sold cheaply is what sells; if you view the tech as a cost rather than an enabler and have a "that'll do" mentality, then you're putting yourself in a risky position.


    That's not what happened. There was a brief Microsoft outage around midnight our time. Then at just after 4am an update to customers of Crowdstrike caused PCs to crash. By 530am Crowdstrike had identified and solved the issue. But:

    Some news outlets (especially BBC) reported this as a Microsoft Outage until about 9am. Even though the exact cause was known much earlier. 

    The fix is either to restart the PC up to fifteen times or go in and delete a file. 

    The problem is that all 24,000 of Crowdstrike customers are big organisations with many thousands of PCs and Servers each. All of which will need fixing essentially manually (I exclude virtual machines here because they can be rolled back remotely). Some of these PCs are embedded and it will take days to get around to each one. 


    The problem here isn't a single point of failure but putting Compliance ahead of risk assessment. Compliance insists that systems like Falcon are in place - that box is ticked but the organisation doesn't do a risk assessment and so there's no plan to swiftly resolve issues. 

    This isn't the first time it's happened, in 2010 a similar Mcaffee update knocked out thousands of PCs. But organisations didn't learn. 
  • Le_Kirk
    Le_Kirk Posts: 22,885 Forumite
    First Anniversary First Post Photogenic Name Dropper
    Fortunately the pub was able to accept my contactless payment last night but I had a back-up plan - folding money.
  • victor2
    victor2 Posts: 7,790 Ambassador
    I'm a Volunteer Ambassador First Anniversary Name Dropper First Post
    Le_Kirk said:
    Fortunately the pub was able to accept my contactless payment last night but I had a back-up plan - folding money.

    Ditto. And my second level of backup is a case of beer in the garage.

    I’m a Forum Ambassador and I support the Forum Team on the In My Home MoneySaving, Energy and Techie Stuff boards. If you need any help on these boards, do let me know. Please note that Ambassadors are not moderators. Any posts you spot in breach of the Forum Rules should be reported via the report button, or by emailing forumteam@moneysavingexpert.com. 

    All views are my own and not the official line of MoneySavingExpert.

  • prowla
    prowla Posts: 13,563 Forumite
    Name Dropper First Anniversary First Post
    PHK said:
    prowla said:
    The mistake is in considering a PC software supplier to be a global enterprise service provider.
    Their cloud offering, "Azure", had an outage a few years back, where an Active Directory update propagated globally; it was almost like one of those disaster movies where you see a map of the worlds and the lights gradually go out.
    This time it looks like they embedded a 3rd party piece into their services such that it comprised a Single Point Of Failure (SPOF) across their entire systems, so when it had a bug it went everywhere.
    The days of having controlled roll-outs, QA testing, contingency plans, redundant services, resilient systems, and so-on are long gone.
    Some might say that replication is a resiliency feature, but the risk is that a service which relies on replication sufers from the pitfal that it will also replicate errors.
    As for customers, Microsoft showed that 80% of the product sold cheaply is what sells; if you view the tech as a cost rather than an enabler and have a "that'll do" mentality, then you're putting yourself in a risky position.


    That's not what happened. There was a brief Microsoft outage around midnight our time. Then at just after 4am an update to customers of Crowdstrike caused PCs to crash. By 530am Crowdstrike had identified and solved the issue. But:

    Some news outlets (especially BBC) reported this as a Microsoft Outage until about 9am. Even though the exact cause was known much earlier. 

    The fix is either to restart the PC up to fifteen times or go in and delete a file. 

    The problem is that all 24,000 of Crowdstrike customers are big organisations with many thousands of PCs and Servers each. All of which will need fixing essentially manually (I exclude virtual machines here because they can be rolled back remotely). Some of these PCs are embedded and it will take days to get around to each one. 


    The problem here isn't a single point of failure but putting Compliance ahead of risk assessment. Compliance insists that systems like Falcon are in place - that box is ticked but the organisation doesn't do a risk assessment and so there's no plan to swiftly resolve issues. 

    This isn't the first time it's happened, in 2010 a similar Mcaffee update knocked out thousands of PCs. But organisations didn't learn. 

    Well yes and no...
    I did mention "controlled roll-outs, QA testing, contingency plans, redundant services, resilient systems, and so-on", which goes to your compliance comments and highlights the procedural issues.
    But I take your point that it is the 3rd party software at fault.
    However, the underlying issue is the buying in to the Microsoft centralised/automated replication model, where I said "the risk is that a service which relies on replication sufers from the pitfal that it will also replicate errors".
    You mention the McAffee issue and I mentioned the Azure outage; they're both symptoms of the same problem.


  • victor2
    victor2 Posts: 7,790 Ambassador
    I'm a Volunteer Ambassador First Anniversary Name Dropper First Post
    Interesting that China was hardly hit at all - mainly because they use their own systems as a lot of the western world boycotts their products and won't sell them advanced technology. It's largely foreign companies operating in China that were hit.
    That must be setting minds working in China regarding ways to fight a technology war with the West, as if they haven't already...

    I’m a Forum Ambassador and I support the Forum Team on the In My Home MoneySaving, Energy and Techie Stuff boards. If you need any help on these boards, do let me know. Please note that Ambassadors are not moderators. Any posts you spot in breach of the Forum Rules should be reported via the report button, or by emailing forumteam@moneysavingexpert.com. 

    All views are my own and not the official line of MoneySavingExpert.

  • MouldyOldDough
    MouldyOldDough Posts: 2,075 Forumite
    First Anniversary First Post Photogenic Name Dropper
    victor2 said:
    Interesting that China was hardly hit at all - mainly because they use their own systems as a lot of the western world boycotts their products and won't sell them advanced technology. It's largely foreign companies operating in China that were hit.
    That must be setting minds working in China regarding ways to fight a technology war with the West, as if they haven't already...


    Russia as well - or so they claim ...
  • HillStreetBlues
    HillStreetBlues Posts: 4,114 Forumite
    Homepage Hero First Anniversary Photogenic First Post
    There is certainly more incentive not to cause a major muck up in either Russia or China.
    Let's Be Careful Out There
  • MattMattMattUK
    MattMattMattUK Posts: 9,278 Forumite
    First Anniversary First Post Name Dropper
    TELLIT01 said:
    The wonders of modern technology.  I wonder how those who brag about never carrying cash are getting on.
    Totally fine. I have not used cash since 2019 and used less than £200 in the two years before that. I went to Waitrose even though they had been reported in the media (and social media) as not being able to accept cards. I checked when I got there and the staff said cards were working fine. I spoke to one of the assistance when I was at the self checkout and they said that there was about half an hour where card payments took a couple of minutes to go through, but that they never stopped working. 

    In many of the places hit hardest they were not accepting cash either, they just locked the doors as the entire till system was offline. 

    There was a certain amount of hysteria and hyperbole thrown around as well, the talk of people "starving" if unable to buy food, how would people cope etc. In a worst case scenario just not eat for a day, or buy food elsewhere, but there was certainly no risk of starvation. 
  • Sea_Shell
    Sea_Shell Posts: 9,685 Forumite
    First Anniversary Photogenic Name Dropper First Post
    My laptop wants to do an update.  Windows.

    I've ignored it for a couple of days, but I hope this is not connected in any way and no reason I shouldn't run it.
    How's it going, AKA, Nutwatch? - 12 month spends to date = 2.50% of current retirement "pot" (as at end August 2024)
  • victor2
    victor2 Posts: 7,790 Ambassador
    I'm a Volunteer Ambassador First Anniversary Name Dropper First Post
    Sea_Shell said:
    My laptop wants to do an update.  Windows.

    I've ignored it for a couple of days, but I hope this is not connected in any way and no reason I shouldn't run it.

    Doubt it. You'd have to be running Crowdstrike software, and they've fixed the problem anyway.

    I’m a Forum Ambassador and I support the Forum Team on the In My Home MoneySaving, Energy and Techie Stuff boards. If you need any help on these boards, do let me know. Please note that Ambassadors are not moderators. Any posts you spot in breach of the Forum Rules should be reported via the report button, or by emailing forumteam@moneysavingexpert.com. 

    All views are my own and not the official line of MoneySavingExpert.

Meet your Ambassadors

Categories

  • All Categories
  • 345.6K Banking & Borrowing
  • 251K Reduce Debt & Boost Income
  • 450.9K Spending & Discounts
  • 237.6K Work, Benefits & Business
  • 612.4K Mortgages, Homes & Bills
  • 174.3K Life & Family
  • 250.7K Travel & Transport
  • 1.5M Hobbies & Leisure
  • 16K Discuss & Feedback
  • 15.1K Coronavirus Support Boards

Is this how you want to be seen?

We see you are using a default avatar. It takes only a few seconds to pick a picture.