For many companies utilising Microsoft products and solutions, Active Directory is an integral part of the Windows infrastructure stack and a requirement for system access and management. Static workloads have always been easily managed; as each new instance joins the AD domain it remains connected for the server's lifetime. When the instance needs to be recycled or removed from the domain, the AD computer account is disabled and should be removed from AD which is typically a manual process. But what happens when your workloads become more dynamic and require domain access? Manually managing domain membership of instances in this scenario becomes impractical and requires an automated solution. In this blog post, we will look at a serverless implementation for managing the joining and removal of Windows instances in a domain.
The solution should provide:
- Seamless AD Domain joining when the new instances are launched and it should work both for managed and native ADs
- Automatic removal from the AD Domain when the instance is stopped or terminated
- Protection for sensitive information – the identity of the account that is used for joining the domain or removing a computer account from the domain
- Logging to facilitate troubleshooting
Limitations
The solution assumes security groups are set up to allow all domain-required outbound ports enabled on the launched EC2 instance.
The computer object that was joined to the domain remains disabled in Active Directory and further clean-up of the object is required using an appropriate domain policy.
Without proper clean-up of associations in use with SSM State Manager, you may hit the maximum number of associations per AWS account per region: 2000. https://docs.aws.amazon.com/general/latest/gr/ssm.html
Solution Overview
Joining an AD domain, whether native or managed, can be achieved by placing a PowerShell script that performs domain joining into the User Data section of the EC2 instance launch configuration. While it is possible to implement the domain removal of a computer account using the On-Shutdown script using local Group Policy, this process requires additional logic for when to run the script as the process of domain joining restarts a computer, triggering the on-shutdown script to remove the instance from the domain. A better approach is to use Amazon EventBridge along with a custom SSM document and Autoscaling lifecycle hooks to automate the domain removal.
Both domain join and domain unjoin scripts require security context that allows performing these operations on the domain, usually achieved by providing credentials for a user account with corresponding rights. In the proposed implementation, both scripts obtain account credentials from AWS Secrets Manager under the protection of security policies and roles – so that no credentials are stored in the scripts. Both scripts generate a detailed log of their operation stored in Amazon CloudWatch logs.
The diagram below represents the serverless solution for performing the AD domain joining and removal for instances within an Autoscaling Group. The solution utilises a combination of PowerShell scripts, SSM Associations and Documents triggered by EC2 User Data and Amazon EventBridge events. In this example, we are using AWS Managed Directory Service but this solution will also work for Simple AD and Native AD implementations.
Process
- Upon instance launch the lifecycle hook places the instance into a Pending: Wait state
- User Data runs during the launch phase and calls the SSM association: AWS-JoinDirectoryServiceDomain which joins the instance to the AWS Directory Service
- Once the instance is joined to the domain, the User Data signals to the autoscaling group to place the instance into an InService state
- Upon termination, the lifecycle hook places the instance into a Terminating: Wait state which is captured in Amazon EventBridge
- The event triggers an EventBridge rule which invokes an SSM Automation running a custom SSM Document. This document uses PowerShell to remove the instance from the domain and delete the SSM association made by the AWS-JoinDirectoryServiceDomain document
- After the instance is removed from the domain, the terminating lifecycle hook is signalled and the instance is removed from the Autoscaling Group