company reveals what caused last week’s failure

company reveals what caused last week’s failure


“At around 4.05am Wednesday morning, the Optus network received changes to routing information from an international peering network following a routine software upgrade,” the statement said.

Routers in a network automatically share these settings, which determine the most efficient route to move internet traffic around, so the error spread through Optus’ infrastructure.

“These routing information changes propagated through multiple layers in our network and exceeded preset safety levels on key routers which could not handle these. This resulted in those routers disconnecting from the Optus IP Core network to protect themselves.”

That phenomenon, where routers become overwhelmed with too much traffic, is called a “flood” and happens very rarely at a major scale.

Matt Tett, the managing director of technology testing business Enex TestLab, said there had only been three or four similar incidents in the three decades the technology had been in use.

“I’d say [Optus] would be having a few words with their partners,” said Mr Tett. But he said Optus could also make changes to the way it accepted or reviewed software updates to address the risk too.

“You rely on your third-party providers to do their checks and balances as well,” he said.

Telstra compensated broadband customers who were offline for multiple days in 2016 with credits of between $25 and $50, which suggests Optus could have to pay millions in credits. It has already promised its customers 200 GB of free data.

Optus said it had made changes to its network to prevent a recurrence, but did not detail them in its statement or name the contractor that issued the faulty settings.

“We apologise sincerely for letting our customers down and the inconvenience it caused,” its statement said.

Mr Tett said greater redundancy in Optus’ network likely would not have helped prevent the outage because of the way router settings are automatically shared between devices.

Entirely segmenting the network between phone and internet services, he said, was also unfeasible because voice calls were converted to internet data for transmission reasons. And he compared building an entirely separate backup system to asking a government to construct redundant roads.

“They’re not going to duplicate every highway just in case there’s an accident on one,” Mr Tett said.

Optus dispatched staff to “a number of sites in Australia” to physically reconnect or reset its routers, including one on each side of Melbourne. “This is why restoration was progressive over the afternoon,” it said.

It said the widespread failure meant working out how to fix the fault had taken longer than the company would have liked.

Australia’s second-largest telecommunications company faces a government review, a review by the Australian Communications and Media Authority, and a Senate inquiry. Politicians have lashed the company’s communications about the outage.

Optus said it would co-operate with all of them and learn from the outage.



Source link