Keycloak: A case for open-source IDP
During the run of a project, the DevOps team was presented a task of finding the right Identity Provider (IDP) for an uplift of an existing customer environment. The number of users expected to access the environment was estimated at less than 50. There was also a requirement to have the ability to authenticate via SAML or Open ID Connect(OIDC) via a light weight in-house developed IDP portal as opposed to a subscription based Software-as-a-Service provider such as Okta or Auth0.
Active Directory(AD) with Active Directory Federation Services(ADFS) was used in the legacy environment implementation but was considered high maintenance and a lighter weight native linux alternative was preferred as the replacement.
Your typical AD plus ADFS setup would include the following components :
- Primary Domain Controller (DC)
- Secondary Domain Controller
- Active Directory Federation Services Server
- ADFS Proxy(optional)
- Offline CA Server
- Subordinate CA server
Management of these separate components involves a great deal of BAU overhead. This required regular patching, by setting up a Windows Update Server or running patching through AWS System Manager(SSM). Each cycle involved outages and engineering effort during the patch cycle. As an engineering team, we would want to avoid any unnecessary hands-on activities, and prefer to rely on automation where possible. AWS System Manager Session Manager(SSM) was chosen as it provides us with this desired level of automation. A patch baseline defines what patches are applied along with a schedule (the Maintenance Window) as to when to run the patch. Static AD instances would remain static, meaning if one fails catastrophically, recovery especially in a Windows environment can be a long and drawn out process.
With all this overhead in mind, a lighter, more scalable and solution was sought after, enter Keycloak. Why Keycloak? Out of the box, Keycloak met our requirements in the following ways :
- Keycloak provides OIDC, SAML connections for apps to authenticate against
- Keycloak allows for user management via Role Based Access Controls (RBAC)
- Keycloak is a lightweight and quick to deploy
This meant, we could reduce the overhead from up to 6 servers, down to one without losing any functionality. Out of the box, Keycloak lacked high availability and scaling. Keycloak uses a H2 database written in Java, having this bundled together on the same server as Keycloak meant a key system risk, similar to the AD design we were replacing.
On further investigation, Keycloak did have a migration to MySQL option. Moving the database to a separate service, such as AWS Relational Database Service (RDS) meant we could have the ability to scale the front end of Keycloak and have a fault tolerant implementation.
RDS Aurora MySQL, takes away most of the management overhead, and provides quick recovery in case of failure. Here if the database were fail, a new replica, could be spun up quickly to replace it, without the need to setup a separate cluster.
So now we had decoupled the front end and back end, meaning that autoscaling/fault tolerance was possible. Autoscaling and fault tolerance came in the way of a Autoscaling Group(ASG), and an Application Load Balancer(ALB). The ASG gave the required fault tolerance and scaling, where if a health check based on a specified instance metric (i.e CPU or a metric of our choosing) fails, the ASG would automatically create a new instance and terminate the old.
The ALB which is the inbound gateway to Keycloak world would then divert traffic to the new instance, once it is in a healthy state (reference: https://docs.aws.amazon.com/autoscaling/ec2/userguide/attach-load-balancer-asg.html)
We went from managing 5-6 static instances, to:
- a dynamic front end via an ASG
- a managed autoscaling AWS Application Load Balancer
- a managed database service with AWS RDS
This all meant that our IDP, Keycloak, was a reliable, low overhead, secure and easy to manage solution, and a at running cost, lower then the ADFS setup.
Windows EC2 instances carry with them a license cost plus the general cost run time, and at a minimum, would cost 220-260USD a month, based on 6 x t3.medium instance running instances (reference: [https://calculator.aws/](Windows EC2 instances carry with them a license cost plus the general cost run time, and at a minimum, would cost 220-260USD a month, based on 6 x t3.medium instance running instances (reference: https://calculator.aws/).
Whilst our solution, which required no Microsoft licensing, was fair bit cheaper to run. Remembering our use case, less than 50 users, our hosting solution used t3.small Linux instance (any distro besides SUSE/RedHat, because of licensing costs), an ALB, and an Aurora MySQL RDS instance on a db.t3. small instance, which when priced up, came at just under 100 USD a month. (reference: https://calculator.aws/).
This article also highlights the benefits of Continuous Improvement (CI) while constantly delivering results for our customers. In this particular case the team had the opportunity to take a different angle, to research options outside of AWS Native and traditional models, to deploy a lower-cost, elastic and low maintenance IDP solution.
In line with this Continuous Improvement mindset, further iterations of the solution will see Amazon EC2 Image Builder add automation to the Keycloak image building process, and containerazing the solution with Docker, ready to deploy into Amazon ECS and Fargate, further reducing costs and increasing portability, scalability and ease of development. But that's a post for another time.
This blog post was written by Bojan Zivic, DevOps Engineer at AC3.