Using Logs In Incident Response

by Feb 8, 2017 Karine Margaryan

11491207025462.png

A few months back we held a webinar in conjunction with Logentries where we demonstrated “2 Use Cases for Using Logs in Incident Management.” Watch it now here. In the webinar we examined traditional log management and IT alerting tools integrations along with the future of incident management with enriched alerts. Now if you’re already an OpsGenie or Logentries user or using monitoring and collaboration systems, like New Relic or Slack, this blog post might be of some interest to you. You will learn about improving your alert notification processes and incident response times.

WHY IS INCIDENT RESPONSE IMPORTANT?

According to a recent report by Stephen Elliot of IDC, the average cost of a critical application failure is $500,000 to $1 million per hour for a fortune 100 company. Wow! On average that ends up costing these companies about $1.25 billion to $2.5 billion a year. Back in August 2016, Delta Airlines had a 5-hour computer outage, which cost them $150 million. These outages are extremely costly and can damage your company’s brand reputation. It is imperative to prevent and respond to such incidents as quickly as possible to guarantee the further success of every organization.


TRADITIONAL LOG MANAGEMENT AND IT ALERTING TOOLS

When an issue arises, anomalous log data triggers an individual or team for further investigation and remediation. OpsGenie integrates seamlessly with over 100 different monitoring and incident management systems, such as SolarWinds, New Relic, Atlassian and DataDog. These tools check the health and availability performance of applications, servers, networks, as well as ticketing and collaboration tools where reported incidents should be acknowledged as fast as possible. Log Management tools are also among OpsGenie’s primary integrations, such as Logentries, which works with huge data and can identify where problems arise.


OPSGENIE + LOGENTRIES = IMPROVE INCIDENT RESPONSE TIME

Integrating Logentries and OpsGenie creates the ideal solution for both incident identification and response. When an alert comes to OpsGenie from the Logentries Integration, you can design actionable alerts by using default or custom fields and the more thoroughly you design the content of your alerts, the easier it will be to comprehend and the more optimal the steps to solve each issue will be, ultimately reducing time spent researching the alert.

21491207025578.png

USING LOG DATA TO ENRICH ALERTS: SITUATIONAL INTELLIGENCE

Using log data in incident response offers you a 360° view to everything happening in your IT environment. It improves situational intelligence, a new approach to using log data to enrich alerts from monitoring systems and key components of your infrastructure to shorten the time to identify and correct issues.

 

In the past, Dev and Ops teams spent excessive time fixing things manually to keep the system up. Now that DevOps processes and tools have matured, many automated features have been released. Automation is an indispensable part of DevOps, and according to the data provided by IDC’s Fortune 100 Survey, 60% of respondents indicate Automation as the highest demand initiative they are looking to implement.

31491207025674.png

Logentries has a rich set of APIs to review logs, analyze them, set different usage or tags, export, or interact with log data. Try OpsGenie to test the integration with Logentries.

 

HOW INTEGRATION WITH LOGENTRIES TECHNICALLY WORKS

OpsGenie gives high importance to incident response; we try to understand by standing in our customer's’ shoes. That’s why we care about the types of alerts they receive and how our customer’s react and deal with certain alerts. When you get an alert, the first thing you do is assess what kind of alert it is. Is it urgent? What caused it? Does it need to be escalated or can it wait until morning?

To manage all of these actions, you have to thoroughly review the logs. OpsGenie helps reduce the time spent looking at logs while helping users quickly figure out these issues.

41491207025774.png

That’s why OpsGenie implemented a layer using AWS Lambda which helps us to run code on the cloud. This is a perfect solution for SaaS. When an alert is created and matches the predefined criteria, it triggers a webhook request, outbound HTTPS request, and it triggers a code in Lambda through API gateway. Through this Lambda script, we can access all alert data, whether it was from Logentries or other sources. The key value here is that you can use Logentries to understand where the alert was generated. Lambda script has access to all alert data and with that data, we can perform relevant queries through the REST API and create HTML, or other extension files, and attach it back to OpsGenie.

 

OpsGenie and Logentries have produced a proof of concept integration that provides situational awareness by using the Logentries Query API to automatically query for log data in response to an alert received by OpsGenie. With this new integration, we are automating the investigation process involved in responding to an incident.

To get a deeper understanding on how to use Logs in Incident Response, watch our webinar here.