Minimizing Business Impact during Outages
Authors: Bailey Bercik & Swetha Rai
It has been said that nothing is certain, except death and taxes, but the person who said this clearly wasn’t working in IT or they would have included system outages in that list. According to the Uptime Institute, a third of data center and IT managers who reported an outage in the past three years had costs of $1 million or above to their businesses. Optimizing your identity infrastructure to allow for business continuity, cost reduction, and modern security practices can help mitigate some of the risk that comes with using any cloud service. Our names are Swetha Rai and Bailey Bercik. We work in the Azure AD product group and have helped multiple Fortune 500 customers with their deployments. While we do work for Microsoft, these recommendations can be applied to any (Identity-as-a-Service) IDaaS provider. The recommendations we’ll be covering in this post fall into three key categories: reducing dependencies, building resilience in your applications, and improving your credential strategy.
To reduce dependencies, when possible, you should adopt IDaaS offerings to mitigate the risks involved with systems where you own and manage the resilience of the authentication infrastructure, such as on-premises federation infrastructure. Any on-premises authentication components in your infrastructure are potential points of failure, so you should plan for redundancy.
Most Identity providers, especially Customer Identity and Access Management (CIAM), allow you to call external systems using APIs during account sign up or first authentication. Be mindful that each external API introduces a risk to your users’ ability to sign-in and plan for resilience around them. Implement appropriate error handling, so your application fails gracefully while informing the end user appropriately. Avoid using external API calls in pre-authenticated paths to mitigate the risk of Denial of Service (DoS) and Distributed Denial of Service (DDoS) attacks to your application. To increase the infrastructure resilience for your business-critical applications, consider setting up monitoring of sign-in health such that you receive an alert when there is a significant change in authentication patterns for the app indicating a potential impacting incident.
By following our third recommendation of improving your credential strategy, you’ll prevent single points of failure. For example, ensuring your users have multiple ways to satisfy strong authentication will prevent reliance on the dependencies pictured in the diagram below because they can each experience a service outage. An even better approach would be to encourage the use of passwordless authentication methods. Passwordless authentication has fewer dependencies than strong authentication using separate factors and is therefore more resilient.
In addition, by reducing these strong authentication prompts, you can further mitigate your risk of impact. One method of doing this would be to leverage trusted devices. By trusting devices you’d expect a user to be coming from, especially if it’s a corporate owned device, you can challenge the user for MFA less frequently because the risk of a bad actor physically compromising a device is lower than a bad actor using compromised credentials on a new device. Another method of reducing these prompts would be by using an emerging standard, the Continuous Access Evaluation Protocol (CAEP) which is being proposed through the Shared Signals and Events Working Group. This protocol allows for a user’s access to be revoked due to context or policy updates and can further reduce your need to prompt users.
In summary, while we gave a few recommendations, the quickest and most actionable things you can do immediately to mitigate some of the risks of a potential outage would be:
- Make sure your users have multiple options to satisfy strong authentication requirements.
- Configure sign-in health monitoring and alerts for critical applications
- Apply more lenient session policies on trusted and familiar devices to reduce how frequently users are prompted for authentication on these devices.