The five laws of cloud-native authorization
Nov 16th, 2022
In a microservices world, each service needs to verify that a subject (user or machine) has permission to perform an operation on a resource that the service manages. But in an agile environment where each team owns the implementation of their application or service, authorization models are quick to diverge. Since roles and permissions need to be enforced consistently across the organization’s services and applications, this divergence makes it hard to evolve the authorization model in a holistic way. Agility and security suffer.
Managing these cross-cutting concerns is exactly the reason for the rise of platform engineering teams. And not surprisingly, authorization has become a hot topic in platform engineering circles.
Google, Airbnb, Intuit, Netflix, Carta, and others have described the systems they’ve built to standardize authorization across their different services and applications. While the details are different, these systems share many of the same characteristics. In particular, there are five architectural patterns that are critical to consider in any modern authorization system. We call these the “Laws of Authorization.”
Before we dive into the laws, let’s first articulate the challenges that teams face:
- Each service defines, manages, and enforces permissions differently
- Coarse-grained roles don’t provide enough resolution over fine-grained resources
- Authorization context flows via access tokens that embed stale permissions
- Each service implements authorization as a set of “if”/”switch” statements
- Services don’t consistently log their authorization decisions
Each service defines, manages, and enforces permissions differently
Agile teams own the design and implementation of their own service, but authorization is a cross-cutting concern. If each service defines, manages, and enforces its own permissions, there is no central point of control for application administrators to manage assigning roles and permissions to users or groups.
Coarse-grained roles don’t provide enough resolution
Roles are a common abstraction for rolling up a set of permissions. But services and applications often expose finer-grained resources, such as projects, teams, lists, folders, or even individual items. Customers often require (or demand) the ability to assign certain users or groups permissions for a subset of these resources. The authorization system must support these scenarios.
Authorization via permissions in access tokens
A naive implementation of authorization is to have an identity provider “bake in” a set of scopes into the access token; the relying party (service) checks that the token contains the required scopes, and treats this as proof that the user has access to perform an operation. This approach suffers from two fundamental flaws.
First, scopes are not fine-grained enough to allow distinguishing between resources (does the “read:document” scope mean the user has read access to all documents?).
Second, permissions thus embedded in access tokens are very difficult to revoke: while the user has a signed, unexpired access token, they continue to have that permission, even if the underlying data used to compute that permission has changed.
Authorization “spaghetti code”
When authorization is implemented by each service, it often takes the shape of a set of “if/switch” statements that are embedded in the application logic. This approach makes it hard to add permissions, change the assignment of permissions to roles, and next to impossible to reason about when you’re trying to understand how access flows through an application.
As a consequence, security engineering teams must resort to reactive measures like SAST/DAST scanners, as opposed to being able to proactively define the authorization policy for the application.
Inconsistent decision logs
In today’s environment, compromised identities and breaches are not a question of “if”, but “when”. When a security team discovers a compromised identity, it’s essential to understand what operations an attacker was able to perform. When each service implements these differently (if at all), this type of forensic analysis is next to impossible.
The Five Laws of Authorization
Platform engineering teams that want to tackle these challenges holistically should consider five architectural patterns that flow from these challenges:
- Unified authorization service with a distributed systems architecture
- Real-time access checks
- Fine-grained authorization
- Policy-based access management
- Centralized decision logs
Unified authorization service with a distributed systems architecture
The first step in moving to a cloud-native authorization model is to extract authorization out of each service, and into a separate concern. This is exactly what the technology organizations mentioned earlier have done. This helps platform engineering teams standardize and centralize the design of their authorization system.
However, the authorization service must not become a choke point for the entire system. Authorization needs to happen in a small number of milliseconds, and be 100% available to callers. To achieve predictable latency and availability, the authorization service should be deployed as close to the application / service as possible. This often means deploying an authorizer service as a sidecar, or as a microservice running right next to the application pods.
This leads to a distributed system, or a decentralized architecture, for authorization.
Real-time access checks
As mentioned, relying on scopes embedded in access tokens, or permission checks that were performed upstream, can result in an explosion in token size, not to mention security holes. Instead, each service should perform its own just-in-time authorization check before allowing a user to access a protected resource. This practice corresponds to the zero-trust principle of continuous verification.
In order to perform real-time access checks utilizing the decentralized architecture described earlier, we need a smart caching layer. Indeed, systems like Google’s Zanzibar and Intuit’s AuthZ are optimized for computing access checks with local data that is cached at the edge. Changes to the inputs (subjects, objects, and relationships) are performed centrally and then relayed to the edge, so that the locally cached data is invalidated and refreshed.
As successful applications gain more users, it’s almost certain that sophisticated customers will ask for finer-grained access control over the data that the application is managing. For example, a multi-tenant SaaS application may initially get away with having admin, member, and viewer roles that extend across the entire tenant. But every successful application will invariably need to support these roles for finer-grained “nouns” - applications, projects, teams, lists, folders, items, and even fields.
Fine-grained access control is critical for building a secure system, because it follows the principle of least privilege. The more we can lock down the permissions for each user, the smaller the “blast radius” in case that identity is compromised.
The most promising model for fine-grained authorization is the relationship-based access control (ReBAC) model described in Google’s Zanzibar system. With ReBAC, one can describe the domain model of the application by articulating triples that contain subjects (e.g. users, groups, machine accounts); objects (e.g. organizations, teams, projects, folders, items); and relations between them (e.g. owner, member, viewer).
These relations feel like roles in RBAC models, but can be applied over fine-grained resources. Finally, permissions can be associated with each relation, which allows the authorization system to evaluate whether a subject has a permission on an object by walking the relationship graph.
Some organizations prefer an attribute-based access control (ABAC) approach to fine-grained authorization. In this model, attributes on a subject or an object are used as part of the authorization decision. Environmental attributes such as date/time or location can likewise be useful in situations where services can only call other services within the same geographic location (for either performance or compliance purposes).
An authorization service that is flexible enough to support the RBAC, ABAC, and ReBAC models will be able to scale to any organizational requirements.
Policy-based access management
We’ve already determined that authorization should be pulled out of the application logic and into its own service. But if we just lift authorization spaghetti code from one place to another, we’re not taking advantage of an important opportunity to simplify and streamline our decision logic.
Policy-based access management describes the practice of defining an authorization policy in a domain-specific language, and storing / versioning that policy just like code. This is also known as “policy-as-code”, following the best-practices of configuration-as-code and infrastructure-as-code.
The Open Policy Agent (OPA) project from the Cloud Native Computing Foundation is the most promising policy-as-code decision engine in the cloud native ecosystem. By using the OPA DSL (Rego) to define authorization policies, we can follow another important security principle - separation of duties. We empower the security team to own and evolve the authorization policy, while relieving the application from the burden of having to touch this code anytime the security team requires a change.
Centralized decision logs
Following a policy-as-code approach also gives us a natural place to gather decision logs for every authorization decision. In order to provide the security team with the operational system to perform forensic analysis on potential breaches, we need to centralize the decision log stream in our logging system. In addition, many organizations require this level of fine-grained decision logs for compliance purposes.
Finally, centralizing decision logs allows us to follow the zero-trust principle of comprehensive monitoring. Anomalies in granted access can be detected and investigated much more quickly when the decision log data stream is gathered and centralized consistently.
No matter how sophisticated of an authorization service you build, if application developers don’t use it, you haven’t achieved much. Therefore, your service should be exposed as standard APIs (e.g. REST, gRPC, GraphQL). You should also have SDKs / bindings for each language / framework that is in use by the application teams.
Integrates easily into your environment
No service lives on an island - the three most important integration points for an authorization service are:
- Identity Providers: the authorization service needs to continuously retrieve user identities and attributes from the IDP.
- Source code / artifact registries: when policies are stored in source control and built into immutable images that are stored in artifact registries, the authorization service should enable a CI/CD experience for building, testing, and deploying authorization policies.
- Logging systems: decision logs should be centralized in the organization’s logging infrastructure.
Cloud-native and open
Building an authorization system is no small task, and where possible, you should use open source technologies as the underpinning of your system. Open ecosystems bring with them a larger pool of expertise, and help de-risk the investment.
The Topaz open source project was built with these goals in mind. It uses the Open Policy Agent as its decision engine, incorporates a directory modeled after Zanzibar, and is a good place to start when building out a flexible authorization system that follows the Laws of Authorization. Indeed, the Aserto authorization service is built on top of Topaz.
Engineering teams are increasingly called upon to provide a standard authorization capability across their microservices and applications. The Laws of Authorization describe the architectural patterns that ensure a secure, flexible, high-performance authorization system which will scale with the requirements of the organization.