ITP Techblog

Brought to you by IT Professionals NZ
« Back to Home

Griffin on Tech: Big outages show the fragility of our internet

Peter Griffin, Editor. 08 October 2021, 2:25 pm

This week's outage that saw Facebook, Instagram and Whatsapp go down at the same time ranks as possibly one of the biggest disruptions to web services in history.

Around 1.9 billion active users of the world's largest social network provider spent up to half a day without access. It's embarrassing and costly for Facebook, which generates around US$330 million a day in advertising revenue from its platforms. Here's Facebook's explanation for the outage that effectively boils down to a software bug taking offline the global backbone connecting its data centres.

For small businesses, which make up most of the advertisers on Facebook and Instagram, it was also a warning of the perils of relying too much on social media platforms for reaching customers.

Had it been just Instagram that had gone down, it wouldn't have been such a major event. But Facebook's acquisition of Instagram and WhatsApp and integration of them onto the same platform and servers as its core Facebook product has created a major vulnerability. 

Facebook Down.jpg

A server upgrade gone wrong can have disastrous results. It's not the first time this year we have seen the consequences of highly-concentrated web-based service delivery. We've seen Fastly and Akamai, major content delivery network providers, suffer failures that saw some of the world's most-visited websites go dark. Cloudflare, another major CDN operator had a similar outage last year.

Sometimes is a misconfigured domain name server (DNS), sometimes a software update that has unintended consequences. Thankfully, so far anyway, it usually isn't a result of a devastating cyber attack. But it serves as major warning for anyone involved in delivering web services.

The internet has served us well since its creation over 30 years ago. it's a decentralised network of networks with no formal ownership structure. Large companies, telecoms providers, universities and government departments make up the infrastructure that is the internet with the Los Angeles-based not for profit Internet Corporation for Assigned Names and Numbers (ICANN) administering the domain name system that allows internet addresses to be found on the internet.

We have done things to accommodate the proliferation of new services. Content delivery networks are one example, allowing rich media services to be reliably delivered to millions of users around the world simultaneously.

But the consolidation of social media platforms and content providers, as well as diluting competition, is also watering down the resilience of the internet to serious outages. A small handful of players now own most of the infrastructure that constitutes the internet. The problem is only going to intensify.

Four big cloud computing platforms, AWS, Microsoft Azure, Google Cloud and Alibaba Cloud host a huge swathe of the world's data between them. Amazon alone controls around 50% of US e-commerce. Alibaba's share of the Chinese market is more like 60%. 

Cascading effects

Economies of scale have driven the consolidation of internet players and web services. But 2021's disruption highlights the problems of having too many critical services reliant on a small number of providers. It shows that major players like Facebook and Fastly don't fully understand the complexity and vulnerabilities in their own networks. This means that innocuous software upgrades can have a cascading effect. It will be particularly worrying to Facebook that the outage was prolonged because the global backbone outage also prevented its own engineers from accessing key systems and even disabled swipe card access to facilities for some of them.

What's the answer? Logically, it's not to put all of your eggs in one basket. Diversity of providers and redundancy is more important than ever, even if the economics favour consolidating on one platform. At least there's a motivating factor here - the reputational damage and lost revenue from being down for hours is huge for Big Tech.

The increasing focus on edge computing may also serve to increase the potential for disruptive service outages as ever more content is distributed to locations closer to you, a more granular version of what content caching and CDNs offer. You may not get all of the content you need when a Google Cloud or Netflix goes down, but enough to keep you entertained or productive during an outage.

These lessons are particularly important as we connect up smart cities, IoT networks and connect physical infrastructure to the internet. It's one thing to lose Spotify for an afternoon, another entirely for your local water treatment plant to go offline. Diversity is what made the internet great.

We've lost sight of that and risk more frequent disruption to internet services as a result.


You must be logged in in order to post comments. Log In

Web Development by The Logic Studio