What have we learned from last year’s Optus outage?

What have we learned from last year’s Optus outage?


Sydney, NSW, Australia, February 18th 2024. People walk past the Optus mobile store with a glass facade. Image: Istock/Joe Morris

Interestingly I started writing this article on the Friday afternoon the global CrowdStrike outage occurred. It is amazing to reflect on the similarities between the two outages. Both occurred because of a rogue software update and both showing our lack of resilience and ICT diversity. Issues like this need to be addressed in our underlying economic system, where low costs and short-term profits are prioritised over what are now becoming national and international existential issues.

So, this is an opportune time to review what happened since I reported on the Optus outage which occurred on 8 November 2023. Some of the outcomes of that review are also relevant to the CrowdStrike issue.
As was foreshadowed already at the time the government ordered an inquiry into this disaster. Mr Richard Bean, former Deputy Chair of the Australian Communications and Media Authority (ACMA) led this review.

As we reported on at the time, a robust telecommunications system is crucial for Australian society. The Optus network suffered a national outage, disrupting emergency services, hospitals, businesses, and transport networks. This outage highlighted vulnerabilities and caused significant distress and economic loss. It became overly clear that, as technology has advanced, the importance of telecom networks has grown, supporting essential services for government, healthcare, businesses, and individuals.

The Government initiated asked Richard Bean to examine lessons from the incident, focusing on emergency calls, customer communications, complaints handling, compensation processes, as well as the role of government in managing and responding to national service outages. The review was specifically ordered not to go into blame.

Findings and Recommendations
The report issued by Richard Bean in March this year emphasised that while no network can be fully immune to faults, it is crucial to strengthen resilience.
The review acknowledged that Australia’s emergency calling system generally functions well but identified areas for improvement. The recommendations include:

1. Triple Zero Functionality:

  • Establish mandatory requirements for network operators to ensure emergency calls reach Triple Zero.
  • Implement end-to-end testing every six months to ensure continuous access to Triple Zero during outages.
  • Create a Triple Zero custodian to oversee the system’s performance.
  • Share real-time network information with emergency services during outages.

2. Government Role:

  • Improve the Protocol for Notification of Major Service Disruptions, ensuring clear communication and collaboration between government, carriers, and emergency services.
  • Align the protocol with the Australian Government Crisis Management Framework.

3. Customer Communication:

  • Develop a standard for carriers to communicate specific outage information to customers.
  • Enhance public education initiatives to prepare for and recover from major network outages.

4. Complaints and Compensation:

  • Amend the Complaints Handling Standard to address network outage impacts.
  • Implement a standardised approach to compensation for customers affected by large-scale outages, similar to the Customer Service Guarantee framework.

5. Resilience and Interdependencies:

  • Pursue work on temporary roaming during outages, referencing international practices.
  • Establish mutual assistance arrangements between telecommunications providers.
  • Require network operators to maintain remote access and network redundancy tools for core network outages.
  • Review government operations’ arrangements to ensure telecommunications redundancy for critical services.

Conclusion
As always, the proof of improvement will lay in the execution of the recommendations. They are all sensible and doable but some of the costs are significant, at a time that telecoms revenues and profits are under pressure, so there is certain a lot of discussion going on, but hopefully they will not be watered down. The review clearly highlighted the need for a structured approach to managing telecommunications outages, focusing on testing, resilience, and effective communication. Implementing these recommendations aims to improve the reliability of Australia’s telecom networks and ensure the robust functioning of the Triple Zero emergency service.

By addressing these recommendations, Australia can better prepare for future outages, minimise their impact, and maintain the integrity of essential services. The government’s acceptance of all 18 recommendations underscores its commitment to enhancing the resilience of the telecommunications infrastructure, ultimately safeguarding the public and the economy from similar incidents in the future.

 

For more on this topic, P&I recommends:

Optus outage: We can’t afford to have a single point of failure in our telecoms system



Source link