Site icon VMVirtualMachine.com

Two weeks since the Optus outage, documents show backroom scrambling and urgent meetings occurred as the emergency played out

Two weeks since the Optus outage, documents show backroom scrambling and urgent meetings occurred as the emergency played out
Spread the love


In the two weeks since Optus suffered a devastating nationwide outage of its mobile and internet services, the company has shifted the blame to other organisations, attempted to calm its irate customers, and seen its CEO resign.

As Australia’s second-largest telco and its Singaporean parent company Singtel attempt to mend Optus’s reputation, submissions to a Senate inquiry into the outage have told us more about the backroom scrambles and urgent meetings which unfolded as Optus tried to relieve its major headache.

The company said 150 engineers and technicians formed the “core group of personnel” who worked to fix the outage, while 250 other workers and five international companies also provided support.

Let’s take a look at how it all unfolded on Monday, November 8, and the aftermath that followed.

All times are in Australian Eastern Daylight Time.

4:05am: The Optus network crashes

On the morning of Wednesday, November 8, around 10 million Optus customers and 400,000 businesses began their mornings without mobile or internet service.

Optus said its post-incident analysis had found the loss of connectivity began at around 4:05am.

At that time, the company did not know what had caused the outage.

From 4:05am: Engineers try to find the issue

Optus said its engineers worked with several hypotheses to try to restore its network as soon as the outage began.

The company said this work included, among other things:

  • Rolling back earlier changes to the network, to confirm they weren’t the cause
  • Checking if Optus had been intentionally overloaded by a cyber attack (it appeared it had not)
  • Examining whether the issue was due to problems with network authentication

Optus said it also began to examine media inquiries from 4:27am.

As engineers continued their work, some customers attempted to contact Optus by going to their local store, or contacting Optus helplines if they had access to other service providers.

Some customers demanded answers from staff at Optus stores during the outage.(
ABC News: Cathy Border
)

6:33am: Optus releases its first statement

The company’s first statement said it was aware of the outage and was working to restore its services as quickly as possible.

Further messages were also posted on Optus social media accounts and the organisation’s website.

7am: The Department of Home Affairs contacts Optus

The Australian government’s Department of Home Affairs, the regulator for telecommunications safety, said some of its senior officials reached out to Optus via encrypted messaging app Signal to offer their assistance.

The department said its national cybersecurity coordinator remained “engaged”, in case Optus eventually discovered that a cyber attack had in fact occurred.

7:02am: Optus notifies the ACMA about emergency call issues

The Australian Communications and Media Authority (ACMA), which regulates broadcasting, radio and telecommunications, and whose remit includes access to triple-0 calls, said it received a notification from Optus at 7:02am that its network outage was “adversely affecting the carriage of emergency calls over the Optus network”.

Optus released statements saying that triple-0 services could not be contacted using Optus landlines, and there were some issues with mobile services when they tried to use other networks to access triple-0 services.

Optus says such a major outage is a rare occurrence.(
ABC News: Daniel Irvine
)

9:30am: The federal government schedules a meeting

The Department of Home Affairs said it sent an email to Optus shortly after 9:30am, inviting it to provide an update on the outage to Australian state and territory governments.

Some of the states and territories were experiencing communications issues within their departments, including some local health and transport authorities.

The meeting with Optus would be coordinated through something called the National Coordination Mechanism (NCM), which was implemented during the COVID-19 pandemic, and was scheduled to take place at 2pm.

10:21am: Optus appears to find the problem, and fixes begin

Optus said it was at this time that its leading hypothesis — a sudden flood of new internet routing information, which had caused some parts of its network to disconnect themselves — was identified as the likely cause of the network crash.

The company said it set about resetting parts of the network by both remotely and physically rebooting and reconnecting some parts.

This system reset is said to have included more than 100 devices in 14 sites across the country.

Optus said it then began to “carefully and methodically re-introducing traffic onto the mobile data and voice core to avoid a signalling surge on the network”.

10:30am: The Optus CEO speaks on ABC Radio

Optus CEO Kelly Bayer Rosmarin was interviewed in a voice-over-internet call with ABC Radio Sydney, using WhatsApp.

She said there was “no indication” of a cyber incident, and the company was working on “a number of hypotheses” to find the cause of the outage.

“The teams are trying many different angles and we will not rest until the service is back up for our customers,” she said.

Ms Bayer Rosmarin added that it was “highly unlikely” that the outage was caused by an overnight software update.

The chief executive took part in 10 other media interviews on the day.

Loading…

10:38am: Customers slowly begin to come back online

Optus said services began to come back for some customers about 20 minutes after the apparent problem was discovered and fixes began to be implemented.

11am: The communications minister gives a press conference

Communications Minister Michelle Rowland held a press conference in Sydney, and told waiting media that she understood the cause of the outage was “deep in the core” of the Optus network.

“The core network basically encompasses everything from routing to electronics. So it is a fault that is quite fundamental to the network,” she said.

“But my understanding, having just recently spoken again to the CEO, is that a number of problems have been identified and that Optus continues to work on this.”

Loading…

11:32am: Optus updates the Telecommunications Ombudsman

The Telecommunications Industry Ombudsman (TIO), which deals with phone and internet disputes for consumers and small businesses, said it had reached out to Optus for updates at 8:12am and received a reply at 11:32am which included, among other things, details of some contact avenues for consumers.

Optus would later set up a team which the TIO could refer consumers to if they wanted to escalate their complaints.

12pm: Half of Optus network sites are back online

Optus said that by 12pm, 56 per cent of its Radio Access Network sites had been restored.

These sites allow people’s individual devices to connect to the wider Optus network, and the internet.

Optus says half of its network sites were back online by 12pm AEDT on the day of the outage.(Supplied: Optus)

2pm: Optus attends the meeting with federal and state governments

Optus representatives attended the meeting set up by the Department of Home Affairs under the National Coordination Mechanism. The ACMA, which regulates access to triple-0 calls, was also in attendance.

Home Affairs said Optus briefed officials on what it knew about the outage, but “a number of details remained unclear” about the exact cause of the incident.

By this time, Optus said 98 per cent of its customers had been reconnected to its network.

4pm: Optus declares the outage over

Optus declared at 4pm that its network outage had ended, and said a technical fault was to blame.

More than 99.72 per cent of its network had been restored by this time, but there were still some connectivity issues for around 9,000 NBN customers during the evening.

The ACMA said it monitored Optus as the telco started to carry out welfare checks on people who were unable to connect their triple-0 calls during the outage, which providers are required to do following a major incident.

Ms Bayer Rosmarin would later tell a Senate inquiry that 228 triple-0 calls were unable to go through on November 8, and Optus had conducted welfare checks on all of those callers.

“Thankfully, everybody is OK,” she said.

Optus said it continued to review its network and make necessary changes in the days following the outage, until the afternoon of Monday, November 13.

Loading…

November 13: Optus says a software upgrade was the issue

On Monday, November 13, five days after the outage, Optus released a statement saying the network crash had been caused by “changes to routing information from an international peering network” which occurred “following a routine software upgrade”.

“These routing information changes propagated through multiple layers in our network and exceeded preset safety levels on key routers which could not handle these,” the company said.

November 15: Optus appears to shift the blame

Two days later, an Optus spokesperson told ABC News that an internet exchange operated by the telco’s parent company Singtel was to blame for the outage, as it sent changes to internet routing information which triggered the crash.

It came after a report by Nine quoted an unnamed insider who reportedly said Singtel may have been responsible. Optus had initially declined to comment on that report.

November 17: Optus faces a Senate inquiry, shifts the blame again

Two days later, when Optus chief executive Ms Bayer Rosmarin faced questions from a Senate inquiry in Canberra, she said any reports blaming Singtel for the outage were based on a misunderstanding.

She instead blamed routers used by Optus, which were built by American technology company Cisco, for shutting down after Singtel carried out a software update at one of its internet exchanges.

“[The root cause of the issue] was that Cisco routers hit a fail-safe mechanism, which meant that each one of them independently shut down. That was triggered by the upgrade on the Singtel international peering network,” she said.

“That was misinterpreted by media as the root cause being the Singtel upgrade. But the trigger was the Singtel upgrade, and the root cause was the routers.”

Ms Bayer Rosmarin also avoided commenting on rumours about her potentially resigning.

At the time, Cisco said it was continuing to support Optus and provide it with technical advice.

The Optus executives faced a barrage of questions from senators.(ABC News: Simon Beardsell)

In its written submission to the Senate inquiry, Optus said approximately 90 routers were involved, and their default settings led to them self-isolate in order to protect themselves from an “unexpected overload of Internet Protocol routing information” which occurred “after a software upgrade at one of the Singtel internet exchanges (known as STiX) in North America”.

In a statement, Cisco said: “We can confirm that Cisco routers performed as configured and we continue to advise the customer and provide relevant support.”

The Senate inquiry was also told that no-one at Optus foresaw any risks that the routers could cause issues.

November 20: The Optus CEO resigns

Three days later, in a press release issued on the morning of Monday, November 20, Singtel announced Optus CEO Ms Bayer Rosmarin had resigned from her role “in the best interest of Optus”.

She said it “had been an honour to serve” but that “now was an appropriate time to step down”.

Yuen Kuan Moon, the chief executive of Singtel, said the company recognised “the need for Optus to regain customer trust and confidence as the team works through the impact and consequences of the recent outage and continues to improve”.

He said Optus’s priority was about “setting on a path of renewal for the benefit of the community and customers”.

Australia’s National Anti-Corruption Commission chief Paul Brereton has since questioned the usefulness of Ms Bayer Rosmarin’s resignation, claiming a “blame culture” has searched for scapegoats rather than solutions to known problems.

Loading…

Multiple investigations are now underway

Optus is now facing multiple investigations and calls for class action lawsuits over its November 8 outage.

Communications Minister Michelle Rowland has announced a government review into the incident.

The Department of Home Affairs said it was considering “whether this incident highlights any potential failures by Optus to comply with its legislative obligations”.

The ACMA has also opened a formal investigation into Optus’s compliance with its regulatory obligations.

Greens Senator Sarah Hanson-Young, who is chairing the Senate inquiry into the outage, said on Tuesday that Optus needed to restore trust with Australians.

“They just were not prepared, and they were terrible at communicating with their customers,” she said.

The Senate inquiry’s report is due to be handed down by Saturday, December 9.

Loading…

If you’re unable to load the form, you can access it here.



Source link

Exit mobile version