With a mature Configuration Management Database (CMDB) in ServiceNow, the ability to build resilience in operations is imperative to not only moving to a proactive mode of operational delivery, but also improving operational stability. In times where resourcing costs are soaring and cost of doing business continue to increase, harnessing smart technology solutions to help support robust and proactive delivery has never been more critical.
A mature CMDB in many respects is a pre-requisite to integrating monitoring solutions to ServiceNow to support the flow of alerts/events from monitoring toolsets based on pre-defined monitoring rules. The reason the CMDB is so pivotal, is that when alerts and events are generated from monitoring solutions, when they hit ServiceNow, ideally, they will correlate to the existing configuration item (CI) in the CMDB. When an event or alert does not correlate to an existing CI, the event will only have context generated from the source system, but will not be enriched by any supporting data from the CMDB. This does still occur on occasion when there is a mature CMDB dataset, and business rules can be established to create a CI record into the CMDB with the context provided from the source monitoring tool. At this point in time, review and analysis should be undertaken to either uplift the CI, or understand why ITOM is not picking up the CI from scanning across the environment.
In the context of Event Management, digital resilience is improved when the mode of operational delivery moves towards proactive issue resolution before the business or external customers raise an incident due to service/performance degradation. It is a means to deflect calls to allow teams to focus on items that require attention, improves the overall stability of the technical environment, and improves the perception of IT who underpin business services.
AC3 recommendations:
-
Start with a single monitoring capability first to establish a repeatable process for onboarding new alert and event rules.
-
Focus on critical alerts first which have the highest risk of impacting business services.
-
Implement auto incident creation logic to remove manual touchpoints when certain alerts/incidents are known to cause impact. This will mobilise Major Incident Management Processes sooner.
-
Leverage ServiceNow’s Service Graph Connectors to increase the speed at which monitoring sources can be onboarded.
-
Ensure Platform Teams have the required reporting and dashboards enabled to ensure real time visibility can be maintained to underpin delivery.
-
Establish an ongoing operational process for fine-tuning event management rules as platforms mature and evolve.
-
Continue to move further down the priority list (P1, P2, P3 etc) as Event Management Processes mature.
-
Activate Health Log Analytics to support prediction and prevention of outages. Leverage this capability to shorten mean-time-to-resolve by gaining insight and data-driven advice on how to fix an issue.
-
Automate where you can support remediation efforts to remove the manual overhead on platform teams.
For more information on how we can help you improve your digital resiliency with ServiceNow, click here