Scaling Okta to 50 Billion Users
Last updated: March 2023
A paradigm shift in scale for Identity and Access Management.
50 Billion users? No, that’s not a typo.
Identity and Access Management is not about maintaining a profile for every person on the planet. It's about an identity — for each employee, customer, or partner — for every organization in the world. How many separate organizations maintain data about your identity? 20? 50? 100?
When Okta thinks about scale and where we need to be in the future, we think about it in the 10s of billions of users. Our vision is to connect every user, application, organization, and device. From the start, we designed Okta's platform architecture to support this broad vision.
Our approach brings a sea change from on-premises Identity platforms of the past or a build-it-yourself approach. On-premise platforms are expensive, time consuming to set up, and hard to maintain. These platforms are deployed per company, and the onus for scaling them as needed is on the individual customer. Okta’s architecture, however, is designed to dynamically scale system-wide. With proprietary techniques on top of today’s leading cloud infrastructure technology, we have designed a platform with the potential for limitless scale.
At Okta, we understand that Identity is mission critical. Organizations depend on their Identity services to protect and manage access to their most critical applications and data. Identity holds the keys to the kingdom and is, therefore, one of those core cloud infrastructure services to evaluate based on its architecture, feature set, availability, security, and scalability.
Those last three tenets form the three pillars of why Okta’s Workforce Identity Cloud is:
- Always Available: The service must be always available and architected for extreme resiliency, with zero downtime or maintenance windows.
- Secure: The service should be more secure than anything you could build and operate on your own, including world-class, innovative security controls, features, and tools built with a Zero-Trust security posture.
- Scalable: The service needs to be automatically scalable seamlessly scaling up and down to meet your needs and business growth.
Uptime
This is only one part of the equation. Today’s users expect a secure, seamless experience while IT and development teams adapt to increasing demand. Interruptions, downtime, and security incidents can hinder an organization’s productivity.
Okta guarantees 99.99% uptime and zero planned downtime by maximizing isolation in a multi-tenant architecture. We maintain our service availability actively on https://trust.okta.com and provide information on performance, security, and compliance. We achieved a 99.9958% uptime in 2021 and 99.9969% uptime in 2022, even as we’ve scaled authentications over 300% per month.
We believe we’re at just the beginning of our journey to 50 billion users, but along the way, we have always architected Okta for more usage than needed. As we onboard customers with greater scale requirements, we have the team and technology in place to get us to even greater levels.
Our proven ability to rapidly scale
We built the Okta Identity Cloud on the industry’s most reliable, secure, and scalable platform. From day one, we knew we had to be more reliable than anything we connected to, and today we’re proud to have a proven track record.
Our authentication volume continues to grow, as does our customer base. We serve over 17,000 customers, and an increasing percentage come from large-scale organizations with high daily transaction volumes and several customization requirements. Okta customers get just what they need as their organization demands it.
We encourage you to learn more about current Okta customers at https://okta.com/customers.
Behind the Okta platform
At its core, the Okta platform was built for scalability, resiliency, and security. Let's break each of these down in more detail.
Scalability:
We authenticate several million users per hour. The Okta platform receives hundreds of millions of web requests per day across API calls, HTTP requests, and content delivery network (CDN) requests and allows customers to exceed default rate limits by 5x to 1000x when necessary. These are mission-critical authentications across customers, partners, and employees and include requests such as logins to core collaboration apps, multi-factor authentication triggered by adaptive policies, minting of OAuth 2.0 and OIDC access and identity tokens, provisioning newly onboarded users to downstream apps, and real-time access deprovisioning.
Our engineering team continually tests the platform for massive increases in current loads. They have successfully run controlled tests for individual customer tenants to hold 100 million users with corresponding increases in authentication volume.
Since we aim for 10s of billions of users and authentications, we’ve continued optimizing our 100% cloud architecture for extreme scale.
Resiliency:
Okta architecture overview
Resiliency is the foundation of our architecture. We deploy our service in a cell-based architecture, allowing the platform to provide isolation boundaries in case of failures and reduce architectural risks. Operating in a load-balanced deployment across regional availability zones, each cell contains hundreds of automated components, which gives us several advantages:
- Risk mitigation
Any fault in infrastructure is contained within a cell using a redundant High Availability (HA) architecture across multiple zones. Even if an entire data center goes down, another AWS Region in a different geography can take ownership of affected accounts within an hour. - Staged deployment and rollback
We can roll out code from one node to one cell at a time or roll back on one cell instead of the entire service. This decreases the surface area of potential issues arising from a code update. - Workload tuning
We’ve broken many Okta services into different tiers so we can tune them to our customers’ various access patterns. For example, we segment big jobs and back-end service processing, run our complex hashing algorithm, protect the database from chatty apps, and separately process large volumes of API calls, interactive user requests, and AD/LDAP agent requests. - Horizontal and vertical scalability
By adding a cell, we can quickly increase capacity. We can also split a cell to double its capacity for tenants on the original cell. In addition, not all tenants are hosted on the same cell, so we avoid the point of diminishing returns on performance. - Geography
Okta maintains cells in multiple countries and geographic regions. This helps customers manage their data residency needs. - Zero planned downtime
This prevents the Okta platform from disconnecting or becoming unstable during a deployment process or update.
Okta also provides compliant cell availability with region-specific architectures designed to meet customer regulatory and performance needs.
Beyond Okta’s proprietary cell architecture, we’ve built extreme redundancy at each layer of the technology stack. Even if a SaaS, PaaS, or IaaS offering used by Okta goes down, Okta remains available for its customers. This strategy extends to redundant monitoring and alerting across our infrastructure. The totality of this approach has enabled Okta to remain on and functional when entire AWS availability zones or systems go offline.
Security:
An architecture built for scale and resiliency is necessary, but equally so is security. Every Okta customer benefits from our investment in world-leading security capabilities. We are committed to supporting best-of-breed security tools and evolving the native security features built-in to the identity cloud. When building our security, we considered the way users access Okta, our tenant keys at rest, tenant data at rest, and more.
Our third-party certifications are also an important validation of our effort to align with stringent security protocols. We take pride in complying with a range of industry-standard certifications and authorizations. One example was our becoming the first identity provider to achieve Level 2 CSA STAR Attestation to our most recent certification, EU Cloud Code of Conduct Level 2. (You can learn more about these certifications and others at https://trust.okta.com/compliance.)
Our strategies surrounding these three pillars enable Okta to deliver the most reliable, scalable, and secure Identity service to our customers.
Okta’s vision and focus on customer success
As described above, Okta has seen a remarkable increase in customers and usage over the last several years. Meeting these demands isn’t simple. It takes the right team, architecture, processes, and — probably most of all — a complete focus on customer success.
That’s why customer success is Okta’s number one priority as a company. As we’ve scaled up and brought on new customers who have pushed us to new volumes, we have worked closely with them to ensure their deployments and go-live dates are as smooth and uneventful as possible. We collaborate with our customers and partners to be ready to respond and react to events that are impossible to predict. In turn, we’ve learned from our customers to optimize how we prioritize identity event processing to build the most secure, seamless experience for all Okta users.
Okta’s architecture is markedly different from the old approach of a separately scaled infrastructure for each customer. It’s far more powerful, resilient, and scalable, but the right team must manage it. This combination of technology and people makes Okta a leader in Identity and Access Management for Workforce and Customer Identity use cases.
Jon Todd, Chief Architect @ Okta