Automate Active Directory cleanup with Okta Workflows and Open AI

From the beginning, Okta's vision has been to enable anyone to safely use any technology. With the rise of technology like AI, that vision now includes leveraging your choice of AI model on your data using retrieval-augmented generation (RAG) and semantic search.

Many organizations use Active Directory (AD) as their Identity directory. For decades, users and their group memberships have been managed ad hoc with no governance, leading to a large number of groups that are unused, have excessive standing privileges, and lack insight into the membership of the groups. 

Until now, AD administrators have had to add users to groups based on similar roles and then add additional group memberships based on one-off requests with no governance in place. Essentially, these group memberships are clusters of users with similar attributes, like job code, title, department, or location. However, you’ll also encounter outliers that reflect the one-off assignments. This manual and error-prone process is prime for automation, and more importantly, helps AD admins ensure confidence in the security and governance of their membership setups.

Summarize group data with Retrieval Augmented Generative AI 

Using a RAG AI approach to summarize group data and create group descriptions is an innovative application of AI technology to solve this legacy challenge. With Okta Identity Governance features, teams can streamline certification campaigns for AD groups and automate birthright access, helping organizations modernize and secure their application access. 

A modern AI-based approach to automate birthright access can be accomplished elegantly as follows:

  1. Produce documentation using RAG AI and Semantic Search: Two stages come into play, creating and tuning the model and using the model to generate data. This is the key takeaway below. 
  2. Launch Okta Identity Governance user campaigns with manager certifications to remove unneeded group memberships. The manager uses the documentation generated by AI as a guide to approve or revoke group memberships. 
  3. Automate the birth-right access using Okta Identity Governance features of Entitlement Management, Lifecycle Management, and Workflows.
     

Getting started

To get started, a dataset has to be created with relevant user and group attributes. Okta Workflows helps by creating a simple CSV file with the identity data. The next step is to break the generated data into chunks. The chunk size is a hyperparameter of the model that can be tuned, and different chunking will yield different results. 

An embedding is a vector, or list, of floating point numbers. The distance between two vectors measures their relatedness. Small distances reflect high relatedness and large distances reflect low relatedness. To obtain an embedding using OpenAI, send each chunk to the OpenAI embeddings API endpoint with the embedding model name (e.g. text-embedding-3-small). The embedding model chosen is a hyperparameter of the tunable model. At the time of this writing, text-embedding-3-small and text-embedding-3-large are OpenAI’s newest and most performant embedding models that are available, with lower costs, higher multilingual performance, and new parameters to control the overall size.

 

Vector database example

 

The next “execution” stage to generate documentation for each group uses Okta Workflows to iterate through all the groups. In the first step, for each group, a request is made to the OpenAI Embeddings API endpoint to generate a list of vectors, with the ready-to-use Open AI connector.

 

Making request to OpenAI embeddings API endpoint

 

The next step is to retrieve the top matches from a vector database semantic search. 

 

Retrieve the top matches from a vector database semantic search

 

With the convenience of the prebuilt OpenAI connector, the final step seamlessly matches results from the vector database to a summarized description using the OpenAI Chat Completion API.

 

OpenAI Chat Completion API documentation

 

The generated documentation summarizes the key information about the users who are members of the group providing certifiers valuable information to make an approve/revoke decision.

Traditionally, organizations have depended on governance vendors for analysis, limiting their ability to adjust or tailor the underlying mathematical models to their specific needs. Okta’s innovative approach with Workflows and Open AI revolutionizes this process by allowing customers to harness AI and select the Large Language Model and vector database technology that best suit their requirements.

This flexibility enables organizations to optimize their RAG AI models to gain deep insight and data sanitization within Active Directory. Furthermore, Okta’s advanced governance features automate birthright access, significantly streamlining access management and enhancing overall efficiency.

For a detailed demomore information, watch our video: Build a Retrieval-Augmented Generation (RAG) AI Flow with Okta Workflows | Online Meetup.