Telco reveals cause of network failure, but experts say questions remain unanswered


Optus has revealed the cause of last week’s massive network outage, blaming a routine software update gone wrong, but some experts say key questions remain unanswered even after the explanation.

The telco, which admitted it took longer than it would have liked to investigate the cause, said routers disconnected from its core network after the maintenance.

“At around 4.05am Wednesday morning, the Optus network received changes to routing information from an international peering network following a routine software upgrade,” an Optus spokesperson said.
Optus suffered a major data breach in 2022.
A software update gone wrong has been revealed as the cause of last week’s massive Optus outage. (Photo by Brendon Thorne / Getty Images)

“These routing information changes propagated through multiple layers in our network and exceeded preset safety levels on key routers which could not handle these.

“This resulted in those routers disconnecting from the Optus IP Core network to protect themselves.”

However, two experts say Optus hasn’t fully explained the cause of the downtime, and believe several questions remain unanswered.

“The cause identified by Optus for the national outage last Wednesday morning was human error,” RMIT associate professor Mark Gregory said.

“The Optus statement is poorly worded, but it appears that a routine software upgrade to one or more key routers was the cause of the outage.

“Optus has not explained what went wrong with the test process that should have occurred before the routing software upgrade occurred.

“Also, there is no explanation as to why there appears to have been a lack of redundancy of the key routers, so that if there was a problem the key routers would swap to the redundant routers, which you would expect to be running the previous iteration of software.

“There remains a number of open questions that Optus has failed to explain, but we now know that the Optus outage was not hardware failure and not related to a cyberattack.”

Optus
Experts say Optus’ explanation of the cause of last week’s outage has left some questions unanswered. (Getty)

Mark Stewart, a research fellow at the Centre for Defence Communications and Information Networking at The University of Adelaide, agreed.

“A major telco should have a disaster recovery plan which is more sophisticated than your average corporate network,” he said.

“At a minimum, they should have had a plan to revert the changes, or remotely reboot their systems.

“The statement from Optus in no way clarifies how this event was exceptional, or what preventative measures they had in place to mitigate the impact.”

Optus CEO Kelly Bayer Rosmarin.
Optus CEO Kelly Bayer Rosmarin has faced significant criticism following the outage. (Michael Quelch)

The Optus spokesperson said some of the routers had to be manually reset, which is why it took so long to fix the 13-hour outage.

“The restoration required a large-scale effort of the team and in some cases required Optus to reconnect or reboot routers physically, requiring the dispatch of people across a number of sites in Australia,” the spokesperson said.

“This is why restoration was progressive over the afternoon.

“Given the widespread impact of the outage, investigations into the issue took longer than we would have liked as we examined several different paths to restoration.

“The restoration of the network was at all times our priority and we subsequently established the cause working together with our partners.

“We have made changes to the network to address this issue so that it cannot occur again.”

Customers line up outside an Optus shop fron
Millions of Optus customers around the country were left in the lurch by the outage. (Dominic Lorrimer)

The outage on Wednesday left many of Optus’ roughly 10 million customers without internet and phone connectivity, and impacted their ability to contact emergency services.

The government has announced an inquiry into the outage, which Optus said it will cooperate with.

“We are committed to learning from what has occurred and continuing to work with our international vendors and partners to increase the resilience of our network,” the spokesperson said.

“We will also support and will fully cooperate with the reviews being undertaken by the government and the Senate.”



Source link