Scattered Clouds: A Global Identity Infrastructure
It is no industry secret that Netflix moved early and earnestly to build a global streaming platform on top of Amazon’s public cloud infrastructure. It’s a lesser-known fact that this also extended to all corners of the enterprise, including our workforce and partner Identity infrastructure.
A cloud-only approach to Identity infrastructure necessarily brings along expectations to meet the same high standards of availability as the Netflix streaming service. Relying on a single region’s availability guarantee is not enough. Streaming traffic can be swung out of a struggling region at a moment’s notice, and our Identity services must be able to do the same. In some scenarios, Identity’s migration may even need to precede dependent services in order to provide federation for mission-critical management tools when they follow suit.
However, for a workforce and partner Identity platform composed of both in-house and off-the-shelf services, this level of flexibility and availability does not have a straightforward solution. But it is possible. The key is to choose extensible pillars for your Identity platform which will allow you to relentlessly customize and optimize on top of the foundation they provide. In Netflix’s case, these pillars are Google for our employee datastore and PingFederate for our federation services, though these are not the only means to this end.
To accomplish our availability goals, this means leveraging a data plane distributed across multiple global regions by default. It means writing custom adapters for our user datastores. It means adding levels of indirection at every possible opportunity. It means simulating failures, observing the results, making adjustments, and trying again. It means dialoguing with our providers to influence their product roadmaps to add native cloud features that address in-built limitations that can’t otherwise be designed around (they do exist, but there are fewer than you might think).
Of course, this focus on availability cannot also inhibit flexibility. The platform must still deliver on the promise to provide data streams used for modeling user access patterns, which in turn drive adaptive, step-up authentication. The ability to recompose and customize our Identity flows, when requirements inevitably shift, is key.
Today, this approach results in an architecture that maintains little to no session state and allows us to swing the entirety of the Netflix Identity platform to a new region. Currently, this takes just under 2 minutes and incurs little to no perceivable service disruption.
There is still much to be done in the pursuit of a completely fault-tolerant, zero-disruption service. We invite the rest of the Identerati community out there to help us evolve the perfect cloud Identity architecture.
By Will Rose, IAM Enterprise Architect, Netflix, Inc.View More Posts