Recent high-profile outages at companies like Greggs, Apple, and Meta highlight an alarming trend: IT outages are becoming more frequent. So, what are the reasons behind this trend and what challenges does it pose?

Increased complexity

The internet is a complex system with many layers. As new features and regulations are added, this complexity increases the risk of something going wrong. Brennen Smith, vice-president of technology at Downdetector’s parent company Ookla, attributes the rise in outages to this growing complexity.

Faster innovation, higher risks

The tech industry’s focus on rapid innovation can come at the cost of stability. Companies are quicker to roll out new features, but this can introduce bugs or unforeseen issues that lead to outages.

Vulnerability of cloud services

Many companies have shifted from managing their own servers to relying on cloud service providers. While this allows for faster scaling, an outage at a major cloud provider can have a cascading effect, impacting numerous businesses.

Outdated infrastructure

The core infrastructure of the internet, such as Border Gateway Protocol (BGP), is often quite old. This “technical debt” makes the system more prone to failures caused by misconfigurations or unexpected events.

The human factor

Sudden spikes in demand and staffing shortages, particularly on weekends and holidays, can also contribute to outages. With fewer personnel monitoring systems, the ability to detect and respond to problems is reduced.

The impact

IT outages can cause significant disruption for businesses and users alike. They can lead to lost productivity, revenue, and customer frustration.

The way forward

The industry needs to find a balance between innovation and stability. Investing in robust infrastructure, improving communication during outages, and building in redundancy are all crucial steps. As our reliance on online services grows, so too does the need for a more resilient internet.