Okta’s Build & Release System: Rapid Innovation & High Quality via Automation
One of the things we are very proud of here at Okta is our ability to deliver constant innovation via our weekly releases, and delivering that innovation in a high quality way. To achieve that we maintain a very high level of automation across our build, testing, and release processes. Denali wrote a post in October about our selenium test automation, and in this post we are covering the broader build and release system.
The Okta build and release system automates the products building, testing and deployment. Okta’s product spans a heterogeneous platform of artifacts. The service consists of web servers, database, browser plugins, LDAP agents and windows services.
Our system automates the creation, testing, versioning and deployment of all these artifacts.
When we set out to build this system we focused on a few key capabilities with the overarching goal of fixing bugs earlier, when they are cheaper to fix. (By running extensive end to end testing with every commit, with high code coverage).
The key capabilities of the system are:
- Everything can be built from source control (There are no special libraries or needed items to build any of the artifacts). Our automated build system runs on an EC2 / S3 zone - it is a complete cloud solution.
- On every commit to any branch, an extensive test suite is triggered. If the suite doesn't meet an agressive code coverage requirement, the build fails.
- The versioning system allows for archiving of all artifacts. The artifact archive provides the basis for all the regression testing. The versioning system allows us to map any shipped artifact back to the exact commit in source control it was built from.
Build Process
The build cluster polls our SCM (git) for any commit. When a commit is detected for a topic branch, a full build and test cycle is started on one of the build slave servers. After the artifacts for the branch are built and tested, the resulting artifacts are promoted to the artifact repository. The artifact repository has a <branch>, <artifact>, <version> hierarchy. The repository allows developers to inspect all the <artifacts> created from their <branch>
Most artifacts have their own SCM repository. Some artifacts cohabitate with other artifacts in the same repository. Each repository follows the same branching strategy. Any time any work has to be done in any repository, a topic branch is created. The repository has two other branches “master” and “live”.
Master: The current product/main line of code, into which all topic branches are merged
Live: Contains any artifacts that are pushed to production. Before a release, the master branch is merged into Live
Network Layout
The build cluster continuously monitors the SCM repository. There is one such build cluster per artifact repository. Any commit to any branch will kick off a full build and unit/functional test run. The master machine polls the SCM repository and then directs one of the build machines in its farm to build and run a full test suite against the change committed to the developer’s branch. The system runs code coverage checks to ensure that all the coverage metrics are meet. If the code coverage of the tests falls below a threshold, the build is considered broken. A developer is responsible to make sure any code they write has test that satisfy the code coverage thresholds.
The system provides the developer with feedback on their changes, and encourages the team to work on their own personal/team “topic” branches. This mechanism allows us to know that any code/changes that are merged into the “master” line of code have been fully vetted by our automated tests.
After a “branch” is fully built and tested, all the created artifacts for that branch are promoted to our artifact repository. The artifacts are stored in a “branch” and “version” hierarchy. A developer can then go and “deploy” the latest build artifacts (fully vetted) for their “branch” to a mini Okta stack. This mini stack has all the necessary components to run a system. The developer can then interact with the mini cluster created from their branch.
The monitored SCM repositories are a heterogeneous environment. The same process is followed for Windows and Unix artifacts. Each heterogeneous repository has a corresponding build cluster.
There is a special process that monitors the “Live” branch. Merges into the “Live” branch kick off the building and creation of the final artifacts and packages that will be deployed to production. These items are signed with the Okta certificates and keys
Versioning
Each deployed artifact is versioned independently. A package/component is made up from one or many artifacts. Any artifact that is deployed to production has to have a unique version number. This version number is directly mapped to a “tag” on the mainline of code for that artifacts repository.
Deployment versioning
The version tags are used by the system to determine if the artifact will create a new package for deployment
Before any development can start on an artifact a <artifact>-<version>-begin tag has to be created for that artifact. The <artifact>-<version>-begin tag is an indication to all the downstream build and deploy processes that the version for the artifact is <artifact>-<version>. The final step during the deployment to production is the creation of the final <artifact>-<version> on all the code repositories for all artifacts that are deployed during the current deployment.
Version error checking
The build system checks all commits to a branch and assures that no changes are made to an artifact that has not been designated to being upgraded. If the <artifact>-<version>-begin tag is not present in the repository, the build will fail. This signifies that someone is trying to edit an artifact that has not been designated for upgrade. This ensures that any changes to an artifact will be accompanied by an increment to the artifacts version.
The safety mechanism ensures that no modified artifact is shipped with the same version as a previously shipped artifact. Any version provided by a customer/client maps directly to one and only one piece of code (tag) in the repository.
Deployment
When the artifacts are ready for deployment there is a server in each environment that is responsible for deploying the artifacts in that environment. The server pulls all the artifacts from the deployment repository and makes them available for the clusters on that environment. All database upgrades are also run from the deployment servers.
The deployment servers and repositories are accessible via ACL provided by AWS. This ensures that only people with access to the environment are able to deploy artifacts on that environment.
Lessons Learned
- Build on every commit.
- High code coverage checks for unit and functional tests.
- Build every developers branch (This keeps people from collaborating on the main line of code, if its not ready it does not get committed to the main line).
- Automate the testing of all the database migrations.
- Automate and test the full deployment orchestration on a staging/test cluster.
- Make sure that every “version” that is deployed to a customer can be traced to a build and tag on our “live” branch.
- Easy to spin up clusters for ad-hoc requests from sales and marketing.
The high level of automation that we pursue as part of our development process has been critical to maintaining our regular cadence of delivering high quality releases on a weekly basis and has really helped us refine and improve our process end to end. The result has been a service that is able to deliver new functionality rapidly to our customers, while maintaining high levels of quality and reliability that enterprise IT organizations need and want. I hope this post has provided you some value too!