The telco, which admitted it took longer than it would have liked to investigate the cause, said routers disconnected from its core network after the maintenance.
“These routing information changes propagated through multiple layers in our network and exceeded preset safety levels on key routers which could not handle these.
“This resulted in those routers disconnecting from the Optus IP Core network to protect themselves.”
However, two experts say Optus hasn’t fully explained the cause of the downtime, and believe several questions remain unanswered.
“The cause identified by Optus for the national outage last Wednesday morning was human error,” RMIT associate professor Mark Gregory said.
“The Optus statement is poorly worded, but it appears that a routine software upgrade to one or more key routers was the cause of the outage.
“Optus has not explained what went wrong with the test process that should have occurred before the routing software upgrade occurred.
“Also, there is no explanation as to why there appears to have been a lack of redundancy of the key routers, so that if there was a problem the key routers would swap to the redundant routers, which you would expect to be running the previous iteration of software.
“There remains a number of open questions that Optus has failed to explain, but we now know that the Optus outage was not hardware failure and not related to a cyberattack.”
Mark Stewart, a research fellow at the Centre for Defence Communications and Information Networking at The University of Adelaide, agreed.
“A major telco should have a disaster recovery plan which is more sophisticated than your average corporate network,” he said.
“At a minimum, they should have had a plan to revert the changes, or remotely reboot their systems.
“The statement from Optus in no way clarifies how this event was exceptional, or what preventative measures they had in place to mitigate the impact.”
The Optus spokesperson said some of the routers had to be manually reset, which is why it took so long to fix the 13-hour outage.
“The restoration required a large-scale effort of the team and in some cases required Optus to reconnect or reboot routers physically, requiring the dispatch of people across a number of sites in Australia,” the spokesperson said.
“This is why restoration was progressive over the afternoon.
“Given the widespread impact of the outage, investigations into the issue took longer than we would have liked as we examined several different paths to restoration.
“The restoration of the network was at all times our priority and we subsequently established the cause working together with our partners.
“We have made changes to the network to address this issue so that it cannot occur again.”
The outage on Wednesday left many of Optus’ roughly 10 million customers without internet and phone connectivity, and impacted their ability to contact emergency services.
The government has announced an inquiry into the outage, which Optus said it will cooperate with.
“We are committed to learning from what has occurred and continuing to work with our international vendors and partners to increase the resilience of our network,” the spokesperson said.
“We will also support and will fully cooperate with the reviews being undertaken by the government and the Senate.”