The Language of Incident Management

the-language-of-incident-management-blog-header-image

Language used across the high technology ecosystem is dynamic to say the least. Nowhere else can you find a mixture of technical jargon seamlessly intertwined with references from science fiction, mythology, pop-culture, literature, and more.

While this makes conversations heard across technical environments colorful and engaging, it also makes communications allegorical and metaphorical— opening them to variable interpretation.

At times when communication is more relaxed, this style of conversation can be seen as engaging and playful. However, when incidents happen, the level of severity shifts and a different language appears altogether. Based upon the potentially massive impact of IT incidents across business operations, the language of incident management must be technically precise, actionable, and leave no room for misinterpretation.

Why is this shift in communication necessary? Because modern IT operations are the nexus of businesses operations. If a system goes down, the impact is immediate and significant— costing tens to hundreds of thousands of dollars for every minute of downtime.

With this level of severity, it makes sense that many of today’s terms used in IT incident management are taken directly from terms widely adopted by disaster response teams. Terms that are clear and understandable in chaotic environments. Terms that help teams work together to remediate an incident as quickly as possible.   

Here's a brief glimpse at some of the terms you will see included in our upcoming white paper: The Language of Incident Management.

 Acknowledge / Ack

 

An alert action that notifies other alert recipients that the alert has been seen and is being worked on.

 

 Actionable Alert

 

An alert which clearly describes an issue, is routed to the right people at the right time, and communicates not only the urgency, but the issue's scope of impact.

 

 Active Monitoring

 

Method of understanding the current status, or changes in status, of a service via regular checks.

 

 Escalation
- Functional
- Hierarchical

The method used to notify responders of an incident or alert according to a pre-configured order and timeline.
- An escalation method where the alert or incident is transferred to an individual with more expertise for assistance.
- An escalation method where the alert or incident is transferred to a more senior individual for assistance.

 

 ChatOps

 

Leveraging chat and collaboration tools for incident management, especially to automate actions and the retrieval of supporting information.

 

 Configuration Management System

 

A system of organizing all information used to support the various services or products a company uses and provides. Maintains the operational information for configuration items, as well as design, record of incidents, and any other relevant data.

 

 Downtime/Outage


The period of time or the occurrence of a service not performing or available as expected.

 


 Fault Tolerance

 

The ability of a service to continue operating even if some configuration item or part fails.

 

 Hotfix

 

An update applied to a software in order to solve a problem or software bug. Often used to solve a customer issue specifically.

 

 Incident Lifecycle

 

The series of changes an alert/incident undergoes from creation to resolution.

 

Fast and effective incident management is the lifeblood of any organization’s IT operations. With high costs at stake, it is easy to see why the most successful teams adopt a “wartime” posture during an incident. It’s also easy to see why communication during wartime must be must be specific, direct, and actionable in order to communicate clearly while identifying an issue, what actions need to be taken, and by who.

As our technology platforms have become infinitely more complex and intertwined, the frequency and severity of incidents will only continue to grow. By focusing on language used during an incident, teams can better collaborate in finding the fastest resolution possible.  

For additional reading on modern incident management, visit the Opsgenie Resource Library: https://www.opsgenie.com/resource-library