By Gautam Kshatriya, AC3 Head of Service Management
A large part of IT security is working to prevent disruptions like breaches and outages, but it is impossible to prevent all incidents, which can be defined as an unplanned interruption to a service, or a reduction in the agreed quality of a service impacting customer experience. More succinctly, it’s the impact to an organisation’s business outcomes.
The immediate or ongoing impact of such incidents, however, depends very much on the strength of an organisation’s incident management protocols.
Incident management practice is responsible for restoring services as quickly as possible to the stipulations of an SLA (service level agreement). Alongside problem, knowledge and change management, incident management is one of the most pivotal and visible elements inside IT support organisations. And it can often be the yardstick by which that support organisation is judged.
Significance
Incident management is one of the most important practices in ITSM practice implementations. ITSM (IT Service Management) is the set of systems and practices organisations use to improve the way an IT organisation is managed.
One of the most common frameworks offering best practice for implementing IT Service Management is ITIL® (formerly an acronym for Information Technology Infrastructure Library). Think of ITIL® as a playbook.
When designed and implemented well, the incident management practice and supporting practices (such as problem management, knowledge management and change management) ensure incidents are managed effectively while maintaining agreed quality of services.
Potential Impact
There are many negative risks and potential outcomes of a poorly managed incident response, not all of them immediately apparent. They include:
- Losing valuable data
- Reduced productivity
- Loss of revenue
- Damage to brand or reputation, and
- Disruption of business operations, information security and IT systems.
On the other hand, an incident management practice that works to restore services to the agreed quality as quickly as possible is able to minimise any adverse impacts on an organisation’s business outcomes. This leads to its value being understood and appreciated externally by customers and internally by the employees and practices within IT service management.
There are other benefits that, added together, will have a significant and positive effect on the bottom line.
- Business productivity is maximised by a shortened incident life cycle and decreased downtime.
- Collaboration is improved between all stakeholders.
- There is a reduction of duplication of effort and wastage – due to well-defined roles, responsibilities and consistent practice.
- Collection of data from incidents and their management can then be used as a valuable source of both quantified and qualitative information.
This final point is significant, as the data gathered will provide insights that result in the prevention of future such incidents, while also promoting a proactive rather than reactive culture within an IT support organisation.
Best Practice
Incident management promptly succeeds in responding to, analysing and documenting incidents, while ensuring the customer’s satisfaction is paramount, and clear and consistent communication is prioritised.
Prior to the actual incident response there are four other steps that must be addressed:
- Identification – generally occurs via a report from an end user or a system generated incident that may be referred to as an event.
- Logging – is a critical measure to ensure a full historical and accurate record of each issue is captured, including time and date.
- Categorisation – of incidents into classes provides the ability to track similar incidents in future.
- Prioritisation – is like the triage in a hospital A&E department, a stage that determines the impact, potential degree of damage and, therefore, urgency by using a priority matrix.
The particular incident response can then be broken into five stages:
- Diagnosis – refers to the primary level troubleshooting.
- Incident escalation – functional or hierarchical, escalation is performed if the primary support team cannot solve the issue and need to refer to advanced support.
- Investigation – if no previous solutions are appropriate and deeper investigation is required, a problem record is created and the error correction is transferred to the Problem Management team.
- Resolution and recovery – describes the practice of implementing a solution that restores normal service and gains confirmation of a satisfactory conclusion from the end user or customer.
- Closure – the incident is considered closed and the register is completed with its end status.
Finally
Organisations can increase the efficiency of their incident management practice with the adoption of software-based ITSM tools – investing in options that are customisable and flexible in order to match current IT requirements with the ability to scale as and when required.