Firstly, a couple of things to agree on. Public cloud environments can change frequently, they contain more than virtual machines and network rules are typically restrictive. These pose some challenges for the traditional ServiceNow Discovery process which scans hosts from a central location on a schedule. To be successful, Discovery needs host credentials and line of sight connectivity to SNMP, SSH and WMI ports. The results are far from satisfactory when trying to use this method in the cloud.
Embarking to populate the CMDB without defined goals can lead to a long-running project with little perceived benefit in the end if it ends at all.
For this scenario, I'm setting three high-value and achievable goals as follows.
- gain visibility of what is in the cloud
- understand its value to the business
- assign ownership for when things go wrong
I will also assume for our scenario that we have a sturdy base of foundation data. If you doubt this applies in your environment then you should review your foundation data. What we are not aiming for is to populate every attribute on a configuration item. Neither are we seeking to have detailed dependency trees of our services and infrastructure. These are great aspirations that should be on your roadmap, but they require significant effort to achieve. I believe much value is gained in a short period of time from realising those three goals.
Foundation
The key to realising these goals is to have a tagging policy for your cloud resources that everyone is aware of and importantly, follows. A web search on this topic will return many articles advising this as the best practice for managing cloud workloads. You could therefore expect your Cloud Ops teams to have already designed a tagging policy. But surprisingly, there has been a few times that I've encountered teams that lack any tagging policy or more often, individual teams who have created their own tags with no consistency with other teams. If this is the case, then some additional effort is required to adopt a standard for cloud tags, but the rewards extend beyond just the ServiceNow CMDB. I recommend discussing with Cloud Ops how our three goals will benefit them and what role their tagging policy plays in that. Then go into the detail concerning actual tag keys and values to achieve those goals.
For our purposes, we need three tags applied to all cloud resources. The Application that is using the resource, the Environment it is (e.g Development, Test, Production) and the Business Unit that supports or owns it. The next step is to configure ServiceNow's Cloud Discovery with a service account to collect data. Since this is via the cloud provider's management API it is not dependent on network visibility or credentials to each host. It relies on the permissions granted to the Service Account so check these if the Discovery appears incomplete. Importantly the Cloud Discovery will fetch tags and return details of our PaaS resources, such as containers. Otherwise known as "things that are not virtual machines". The job takes significantly less time than normal Discovery so can be run frequently to retrieve the latest inventory. But a better approach is to enable alert-driven Cloud Discovery. This is where the Cloud provider will notify ServiceNow of any lifecycle changes occurring and brings near real-time Discovery of cloud assets. This is essential to capture those ephemeral assets such as auto-scaling EC2 instances.
Gain visibility
After a Cloud Discovery job is run, the CMDB will be nicely populated. We have achieved the first goal and provided visibility of what is in the cloud to a wider audience. No longer just visible to the Cloud Ops teams, ServiceNow users can use lists, reports and dashboards to mine that data for more information to support the work they do. With the tendency for organisations to have multiple cloud accounts and a move by some to multi-cloud deployments, having this inventory in one place is already a big win.
A word of caution, a common pitfall at this stage is now there are too many CIs to choose from for an Incident or a Change. Not to mention we now have CIs with short lifespans due to scalable architectures. This is where filtering the type of CI class and lifecycle state is important. This is easily done with a reference qualifier on fields such as Affected CI but does need agreement from those involved in these processes. You might also consider a scheduled job to remove decommissioned configuration items after a period if there is no value in keeping them. For instance, I would argue a virtual machine or container that was spun up temporarily and never attached to a task has little value.
The next two goals are achieved by creating Application Services to understand the value to the business and Technical Service Offerings to determine who supports it. These are both derived from the tags collected by Cloud Discovery. Application Services are formed using tag-based Service Mapping to group cloud resources by Application and Environment tags. Whereas the Technical Service Offerings are formed using Dynamic CI groups based on the Business Unit tag. This mapping of business and technical services has been lifted directly from the ServiceNow CSDM which provides prescriptive guidance for building out the ServiceNow CMDB.
Know the business value
Here is the magic piece of the puzzle. Our application service records will define business criticality and business owners. These are key data points in understanding the business value of configuration items and are used for reports and dashboards and can drive workflows for risk & impact assessments, approvals and notifications. Many ServiceNow applications use the relationship between CI and Service out of the box. ITOM Event Management and SecOps Vulnerability Response are two that come immediately to mind. It is likely you will need to incorporate these links between CI and Service into your own workflows and business rules. ServiceNow makes it easy to do this via an m2m table (svc_ci_assoc) that is synchronised with the service maps. This removes the need to navigate the CI relationships table in your scripts and replaces it with simple dot walking.
Assign ownership
For the last objective, the Technical Service Offering records define the operational teams that are responsible for tasks acted upon configuration items. ServiceNow can synchronise the Technical Service Offering fields; Approval Group, Support Group and Change Group with every configuration item that is contained by the offering. This removes the effort in managing those fields per item while still allowing simple business logic in assignment rules and workflows.
The trick to defining these groups from Cloud tags is to use CMDB Query Builder to construct a query against the Key Value (cmdb_key_value) table. The query needs to return configuration items filtered by specific tags. In our example, we would use the Business Unit tag. The query is used to define a CI Group, which is then chosen as the source to create a Dynamic CI Group. The final piece is to create a 'managed by' relationship between the Technical Service Offering and the Dynamic CI Group.
Once this is constructed, any changes made to the group on the Technical Service Offering will be pushed down to all the configuration items. Sounds like a lot so I included a diagram to help in understanding how this hangs together. Now it is just a matter of leveraging those fields on the Configuration Item in your scripts and workflows.
At AC3 we have successfully helped customers along this journey. From providing best practice and real world advice on tagging policies to implementing them against existing and new cloud workloads and then onward to utilising this to drive automation in ServiceNow. If you need assistance here, we can help.