“We set ours Advancing Reliability Blog Serieshighlighting key updates and initiatives to improve the reliability of the Azure platform and services, today we’re moving our focus to Azure Active Directory (Azure AD). As part of this series, we have outlined the key availability principles of Azure AD back in 2019 so i asked Nadim Abdo, Corporate Vice President, Engineering, to provide the latest information on the work of our engineering teams to ensure the reliability of our identity and access management services, which are so important to customers and partners. ”-Mark Russinovich, CTO, Azure
The main promise of our identity services is to ensure that every user can access the apps and services they need without interruption. We have confirmed this promise to you with a multilayered approachleading to our improved promise of 99.99% authentication availability to the Azure Active Directory (Azure AD). Today I am pleased to be able to give you a deep insight into the generally available technology that enables Azure AD to achieve an even higher level of resilience.
The Azure AD Backup Authentication Service transparently and automatically processes authentications for supported workloads when the primary Azure AD service is unavailable. It adds an extra layer of stability to the multiple layers of redundancy in Azure AD. You can think of it as a backup generator or uninterruptible power supply that offers additional fault tolerance while remaining completely transparent and automatic to you. This system works in the Microsoft cloud, but on separate and decorrelated systems and network paths from the primary Azure AD system. This means that it can continue to operate in many Azure AD and dependent Azure services in the event of service, network or capacity problems.
What workloads does the service cover?
This service has been protecting Outlook Web Access and SharePoint Online workloads since 2019. Earlier this year, we completed backup support for applications running on desktops and mobile devices, or “native” apps. All native Microsoft apps, including Office 365 and Teams, as well as non-Microsoft and custom apps that run natively on devices, are now covered. No special action or configuration changes are required to maintain backup authentication coverage.
From the end of 2021 we will start introducing support for further web-based applications. We’ll be rolling out apps with Open ID Connect gradually, starting with Microsoft web apps like Teams Online and Office 365, followed by custom web apps that use Open ID Connect and Security Assertion Markup Language (SAML).
How does the service work?
When a failure of the primary Azure AD service is detected, the backup authentication service is automatically activated so that the user’s applications can continue to operate. When the primary service is restored, authentication requests are redirected back to the primary Azure AD service. The backup authentication service works in two modes:
- Normal mode: The backup service stores important authentication data under normal operating conditions. Successful authentication responses from Azure AD to dependent apps generate session-specific data that is securely stored by the backup service for up to three days. The authentication data is specific to a device-user-app-resource combination and represents a snapshot of a successful authentication at a specific point in time.
- Failure mode: Every time an authentication request fails unexpectedly, the Azure AD gateway automatically forwards it to the backup service. It then authenticates the request, verifies the validity of the artifacts presented (such as refresh tokens and session cookies), and looks for a strict session match in the previously stored data. It then sends an authentication response to the application that matches what the primary Azure AD system would have generated. After recovery, traffic is dynamically redirected back to the primary Azure AD service.
Routing to the backup service is automatic and its authentication responses are the same as those that would normally come from the primary Azure AD service. This means that the protection is activated without application changes or manual intervention.
Note that the priority of the backup authentication service is to maintain user productivity for accessing an app or resource that has recently been granted authentication. These are the majority of requests to Azure AD – 93 percent in fact. “New” authentications beyond the three-day storage window that were recently denied access on the user’s current device are currently unsupported in the event of outages, but most users access their most important applications from a consistent device on a daily basis.
How are security policies and access compliance enforced during an outage?
The backup authentication service continuously monitors security events that affect user access to keep accounts safe, even if those events are detected just before an outage. It used Continuous access evaluation to ensure that the sessions that are no longer valid are revoked immediately. Examples of security events that would cause the backup service to restrict access during an outage are changes in device status, account deactivation, account deletion, revocation of access by an administrator, or the detection of a high risk user event. Only after the primary authentication service has been restored can a user regain access with a security event.
In addition, the Backup Authentication Service enforces Conditional Access policies. Policies are reevaluated by the backup service before access is granted during an outage to determine which policies apply and whether the necessary controls for applicable policies such as multi-factor authentication (MFA) have been met. If the backup service receives an authentication request and a control such as MFA has not been met, that authentication will be blocked.
Conditional Access policies based on conditions such as user, application, device platform, and IP address are enforced using real-time data discovered by the Backup Authentication Service. However, certain policy conditions (e.g. login risk and role membership) cannot be assessed in real time and are assessed based on the resilience settings. The default resilience settings allow Azure AD to safely maximize productivity if a condition (such as group membership) is not available in real time during an outage. The service evaluates a policy assuming that the condition has not changed since it was last accessed just before the outage.
While we strongly recommend that customers leave the default resilience settings enabled, there may be some scenarios where administrators prefer to block access during an outage when a conditional access condition cannot be evaluated in real time. In these rare cases, administrators can Disable default resilience settings per policy within conditional access. If the default resilience settings are disabled by a policy, the Backup Authentication Service will not process requests that are subject to real-time policy conditions, which means that these users might be blocked by a primary Azure AD outage.
the Azure AD The backup authentication service helps users stay productive in the unlikely scenario of primary Azure AD authentication failure. The service offers our service another transparent level of redundancy in a decorrelated Microsoft cloud and network path. In the future, we will continue to expand protocol support, scenario support and coverage beyond public clouds and expand the visibility of the service for our advanced customers.
Thank you for your continued trust and partnership.