ChatOps with Slack and OpsGenie

Mar 12, 2014 by Berkay Mollamustafaoğlu

OpsGenie has supported direct integration with popular chat services HipChat and Campfire for quite some time via our callbacks where OpsGenie forwards alert activity to chat rooms.

Read more »

Fighting Alert Fatigue - Notification rules

Jan 19, 2014 by Berkay Mollamustafaoğlu

Continuing on with the series of blog posts that take a deeper look at how OpsGenie can be used to alleviate alert fatigue. One of the key capabilities of OpsGenie is to enable the users to control how they would like to be notified for different alerts at different times.

Read more »

We've got news

Jan 10, 2014 by Berkay Mollamustafaoğlu

Just a quick announcement about the recently launched “news" site. OpsGenie gets improvements every week, and we wanted to have a medium to share these improvements, even the small ones.

Read more »

Fighting alert fatigue - mute and bulk actions

Dec 29, 2013 by Berkay Mollamustafaoğlu

Continuing on with the series of blog posts that take a deeper look at how OpsGenie can be used to alleviate alert fatigue. Mute, acknowledge all and close all actions were specifically designed for situations where excessive alerting can hinder operations.

11491207017440.png

Read more »

Fighting Alert Fatigue - Alert Deduplication

Dec 27, 2013 by Berkay Mollamustafaoğlu

Concept of alert fatigue is well known in industries such as healthcare, and awareness is increasing in IT operations as well. Fighting alert fatigue has been a key design objective for OpsGenie since the beginning. In the previous post, some of the key capabilities OpsGenie provides that can be used alleviate alert fatigue were summarized. In a series of posts, I will go discuss in more detail on how these features can be used to improve alert signal to noise ratio.

Read more »

AWS re:Invent 2013

Nov 8, 2013 by Berkay Mollamustafaoğlu

OpsGenie is a sponsor of Amazon’s re:Invent conference, and we’re excited to be part of it. Looking at the list of sponsors, and list of sessions this is going to be a very high quality event.


Read more »

Improving Signal to Noise Ratio

Oct 21, 2013 by Berkay Mollamustafaoğlu

Alerting is largely a signal to noise ratio problem - catching critical problems while trying not to drown in the sea of data. Put it in another way, we don’t want to miss any critical problems and we don’t want too many alert notifications.

OpsGenie strives to improve the lives of the alert recipients. So, let’s take a look at how OpsGenie does its part to tackle this formidable challenge:

Read more »

DevOpsDC, alerts and notices

Oct 16, 2013 by Berkay Mollamustafaoğlu

At the last DevOpsDC meetup, the speaker was Robert Treat (@robtreat2) COO of OmniTI, and the subject was “Less alarming alerts”. OmniTI is an interesting company as they both implement large scale solutions and operate Circonus, monitoring as a service solution, hence presentation was bound to be interesting and did not disappoint.

Read more »

The promise of an integrated monitoring and alerting solution

Jun 14, 2013 by Berkay Mollamustafaoğlu

Erik Budin of ScienceLogic has a great blog post that describes the integration of ScienceLogic with (our competitor) PagerDuty. Kudos to both parties for coming up with a well thought out, bi-directional integration that goes well beyond the alerting integration supported by many of the monitoring solutions in the market! We believe that to be able to truly enable operations teams to work effectively, monitoring and alerting integration needs to be much richer than just forwarding alerts. Hence, it’s good to see this type of effort implemented and described in detail. Erik starts the blog post with a real-world scenario that has become possible with the integrated solution:

Read more »

OpsGenie iPhone and Android app updates

Jun 6, 2013 by Berkay Mollamustafaoğlu

OpsGenie client apps were for long due for an update. The latest release (version 1.5) of OpsGenie apps (iPhone/iPad/Android/HTML5) include many usability improvements based on the feedback OpsGenie users have been providing. Here is a list of some of the more visible updates:

Read more »

Tech conferences are broken

May 20, 2013 by Berkay Mollamustafaoğlu

In universities around the world, the teachers spend most of their time in the classrom doing what amounts to a monologue. Sure, the students may ask questions, and there may be some interaction but most students don’t. And even when they do, time available for questions and discussion is often very limited.

Read more »

OpsGenie and Hubot

May 8, 2013 by Berkay Mollamustafaoğlu

In 2013, we announched the Campfire integration via callbacks. Campfire callbacks allow OpsGenie users to push alert activity to Campfire chatrooms as messages.

Read more »

OpsGenie and Campfire

May 7, 2013 by Berkay Mollamustafaoğlu

Couple of weeks ago, we have announced direct integration with HipChat. We’ve been continuing to work on extending OpsGenie callback capabilities.

Read more »

Heartbeat Monitoring as a Service

May 1, 2013 by Berkay Mollamustafaoğlu

In operations, most of the time no news is good news. If we’re not receiving alerts from monitoring systems about problems, we tend to assume that all is well with the world. But what if we’re not receiving alerts because some part of our monitoring solution has not been working for days or even weeks? If you’ve ever found out about a problem with the monitoring systems after being asked why there was no alert for a particular problem, you know what I’m talking about. If you’re supporting a web based application or service, chances are you’re employing a monitoring service to monitor the availability of your application from the outside, preferably from multiple locations. At OpsGenie we do take advantage of external services to monitor availability of OpsGenie web UI, as well as the API end points. External web monitoring enables us to find out quickly when there is a problem with OpsGenie. In addition, OpsGenie has supported what we can “heartbeat monitoring" since the beginning. Heartbeat monitoring enables OpsGenie users to send OpsGenie periodic heartbeat messages. Heartbeat monitoring serves multiple purposes:

Read more »

Your on-call duty is starting

Apr 23, 2013 by Berkay Mollamustafaoğlu

 

Read more »

OpsGenie and HipChat

Apr 22, 2013 by Berkay Mollamustafaoğlu

 

Read more »

OpsGenie Webhook Callbacks

Apr 20, 2013 by Berkay Mollamustafaoğlu

OpsGenie is fundamentally an alert router for operations teams. It receives alerts from operations management systems via email or API, and notifies the right people using the defined rules. OpsGenie also supports "callbacks", and can forward alert activity to external systems via webhooks. Every time an alert is created, acknowledged, commented, closed or when an action is executed by a user, OpsGenie makes a web request to the URL specified in the webhook configuration. The web request includes subset of the alert data in the body of the request in JSON format. Passed data includes the alert messages, as well as the alertId and the alias fields that can be used to retrieve the rest of the alert data via the OpsGenie Alert API. OpsGenie users can configure callbacks to be triggered for all alert data or can define matching rules to forward only a subset of alerts. Webhooks provide a very flexible way to export the alert data that is aggregated in OpsGenie, and are used in many different ways. Some example uses we’ve seen include: 

Read more »

Different notifications for different alerts at different times

Mar 26, 2013 by Berkay Mollamustafaoğlu

Not all alerts are created equal nor they should be treated as such! Some alerts are critical and urgent and we want to receive notifications immediately using any and all notifications methods, and others can wait till the morning, or an email may be sufficient, etc. We find out it is as important for an alert notification system to NOT to wake you up unnecessarily as it is to ensure you wake up when it’s necessary. OpsGenie now puts the user in full control. Users can decide how to get notified for different alerts based on the alert data and the time of day.

Read more »

Managing the schedule exceptions

Mar 26, 2013 by Berkay Mollamustafaoğlu

Schedules and escalations are out of the beta

After a two month beta period, on-call schedules, rotations and escalations features have come out of beta and available to all Pro and Enterprise level subscribers. Several usability improvements have been rolled out based on the feedback we’ve received during the beta process. Thanks for all the feedback!

Read more »

Email integration, even easier and more powerful

Mar 5, 2013 by Berkay Mollamustafaoğlu

 It is safe to say that monitoring tools and services universally support sending email alerts. Hence not surprisingly, creating alerts in OpsGenie via email is the most common integration method used by OpsGenie users. Based on on the feedback we’ve received from OpsGenie users, we’ve enhanced email integration capabilities to make it both easier and more flexible.

Read more »

Complex systems, IT operations and learning from others

Feb 6, 2013 by Berkay Mollamustafaoğlu

 

Read more »

Who to notify when - can I do that with OpsGenie?

Feb 4, 2013 by Berkay Mollamustafaoğlu

 

Read more »

Reducing alert noise using escalations

Jan 30, 2013 by Berkay Mollamustafaoğlu

We’ve recently added support for “escalations" in OpsGenie. Escalations typically refer to notifying different users at different times until the alert is seen and processed (acknowledged) by someone, or problem is resolved and the alert is closed. If the user who gets notified first resolves the problem, or determines the problem is not urgent, etc. other users don’t have to be notified. Since escalations allow notifying only a subset of the users for alerts initially, they can be quite useful in reducing “alert (notification) noise” while still ensuring alerts don’t fall through the cracks. OpsGenie supports both “rules based” and “ad-hoc” escalations. You can create escalation rules that specify who should be notified when; You can then use the escalation rule as the recipient of an alert, instead of specifying users or groups directly. For example, the following escalation rule would notify user “fili” as soon as the alert is created, and if the alert is not acknowledged within 10 minutes, OpsGenie would notify the members of the “web_team” group.

Read more »

Annual payment option and reduced international SMS prices

Jan 29, 2013 by Berkay Mollamustafaoğlu

 

Read more »

Escalations and On-call Schedules with Rotations

Jan 28, 2013 by Berkay Mollamustafaoğlu

 

Read more »

Monitoring for troubleshooting problems vs alerting

Jan 14, 2013 by Berkay Mollamustafaoğlu

 Data generated by monitoring systems can be used to support operational support processes in different ways; and I think it’s useful to know the distinction between the two core uses:

Read more »

Role of alert notifications in IT Operations

Jan 8, 2013 by Berkay Mollamustafaoğlu

Mathias (@roidrage) of Travis CI has an excellent blog post on operations of a hosted product and the role alerting. It’s a good read for anyone who is in operations or would like to understand operations better. In the post, he describes not only what they are currently doing but also the challenges they face, as well as his thoughts on what they will need to do to improve.

At OpsGenie, our goals are highly relevant to the topics discussed in the post. We provide alert & notification management tools to enable ops teams to manage entire alert life cycle, what happens after an alert is generated till the problem is resolved. Since we also operate a hosted service that needs to be up and running at all times, and deal with many of the same challenges mentioned in the post, I wanted to add my 3.1415 cents as well:

Read more »

Nagios and OpsGenie, Yin and Yang

Jan 2, 2013 by Berkay Mollamustafaoğlu

Nagios is an open source IT infrastructure monitoring tool that offers monitoring and alerting for servers, switches, applications, and services. OpsGenie is an alert and notification management service that is highly complementary to Nagios. OpsGenie Nagios integration leverages the Nagios notification system to forward alerts to OpsGenie (either via email or API) and notify users via iPhone/Android push notifications, email, SMS, and phone calls. There are already many OpsGenie users taking advantage of the integration. So what does OpsGenie have to offer for Nagios users?

Read more »

Alert life cycle management in OpsGenie

Dec 27, 2012 by Berkay Mollamustafaoğlu

Most operations teams use number of disparate monitoring tools (and services) to monitor the technology infrastructure, network, systems, applications etc. These monitoring tools all have some degree of alerting. They can generate alerts when they detect problems and can send alert notifications via email, etc. Yet alerting, particularly what happens after an alert is generated differs significantly from between tools.

Read more »

Librato alerts on your mobile devices

Dec 11, 2012 by Berkay Mollamustafaoğlu

Operations folks at Etsy said it best with “measure anything, measure everything”. Metric (aka time series) data collection, visualization, and alerting are essential operations management capabilities. We need to be able to track not only systems metrics such as CPU and memory utilization, but also (even more so) application and business metrics such as response times, number of transactions, etc.

Read more »

Monitoring applications on the cloud - Part Zero

Nov 8, 2012 by Berkay Mollamustafaoğlu

I’ve been thinking about the impact of “cloudification” of technology infrastructure on IT operations management, and particularly on monitoring. Unfortunately, every time I wanted to write about something I feel like I need to write about a lot of other things first, just to provide the context. Monitoring as a discipline covers a surprisingly vast area. What I wanted to write about was the management/monitoring capabilities needed to manage production application running on (private of public) server instances provided as a service (aka IaaS). I’ll refer to this as “managing applications on the cloud” for brevity, and hope that it does not cause too much confusion.

Read more »

Notifications and working with Netcool from your smartphones

Oct 29, 2012 by Berkay Mollamustafaoğlu

IBM Tivoli Netcool is the most common event (alerts in OpsGenie terminology) management solution used by operations, particularly in large enterprises and service providers. Since Netcool is used to collect and consolidate events from many event sources into a central repository, it makes sense to integrate OpsGenie with Netcool to add the capability to notify users for events that are important to them.

Read more »

Get Rackspace cloud monitoring alerts via OpsGenie

Oct 19, 2012 by Berkay Mollamustafaoğlu

 

Read more »

Overwriting quiet hours for critical alerts

Oct 11, 2012 by Berkay Mollamustafaoğlu

 OpsGenie empowers users to control how they are notified. One of the available features is quiet hours. If the user specifies quiet hours, OpsGenie does not send notifications during these hours to the user. This feature is typically used by users who’d like normally be notified when something goes wrong but not want to wake up in the middle of the night unless they have to. But what if for some alerts they do want to be notified whenever?

Read more »

Get notified for OpenNMS events

Oct 5, 2012 by Berkay Mollamustafaoğlu

 

Read more »

Notification methods - which one to use when

Oct 1, 2012 by Berkay Mollamustafaoğlu

OpsGenie provides multiple notification methods (email, SMS, iPhone/Android push notifications, voice calls, etc) to users for number of reasons:

  • Timely delivery of notifications via methods like email and SMS are not guaranteed. Carriers offer SMS delivery as “best effort” and delivery times can vary. OpsGenie allows users to use multiple methods so that they are not dependent on a single method. Note that this does not mean users will get multiple notifications since once the user views the alert, OpsGenie stops sending notifications for that alert through other notification methods.
  • Combination of these methods ensures the widest coverage, enabling OpsGenie to notify anyone who has a computer or a phone.
  • Different notification methods have different strengths and weaknesses.
Read more »

OpsGenie Email Integration - Creating alerts and notifying users just got easier

Sep 27, 2012 by Berkay Mollamustafaoğlu

 

Read more »

AWS CloudWatch alarms on your SmartPhones with OpsGenie

Sep 4, 2012 by Berkay Mollamustafaoğlu

Amazon CloudWatch provides monitoring for Amazon Web Services (AWS) and the applications that make use of AWS. There are many alternatives to collecting resource utilization metrics from EC2 instances, however when AWS services like ELB, RDS, DynamoDB, SQS, etc. are used, CloudWatch metrics play a critical role in the monitoring of the applications running on AWS cloud. One of the key capabilities of CloudWatch service is the alarms. A CloudWatch alarm can watch a single metric over a specified time period and execute automated actions based on the value of the watched metric and given threshold. The automated action may be sending emails, or calling HTTP/S end points, etc. 

Read more »

Zapier, another way to integrate with OpsGenie

Aug 31, 2012 by Berkay Mollamustafaoğlu

As Software as a Service (SaaS) solutions continue to make inroads into the enterprise, integration among disparate SaaS solutions is becoming necessary as it has been the case with on-premise applications. Zapier, a SaaS offering itself is tackling this problem. Zapier provides a platform and an intuitive web based user interface to integrate various web applications. There are already almost 90 applications that can be integrated via Zapier, and we’ve already found number of use cases to integrate various tools such as Trello and HipChat.

Read more »

Why use mobile apps for IT management alerts?

Aug 20, 2012 by Berkay Mollamustafaoğlu

IT Ops folks have been using electronic devices for notifications for decades. It started with pagers on our belts and pagers got more sophisticated in time.

Alpha numeric pagers followed numeric ones that could only display a phone number; and two way pagers with tiny keyboards followed them. Pagers still get used by some operations folks but largely have been replaced by mobile phones thanks to text messaging capabilities available on almost any mobile phone. IT operations processes largely use email as the main communications method to notify users when an action is required and rely on short text messages (SMS) when there is some urgency.

Read more »

Smarts notifications on mobile devices

Aug 3, 2012 by Berkay Mollamustafaoğlu

EMC Smarts (Ionix) Service Assurance Manager (SAM) “tools” enable operators to execute custom actions from Smarts console interactively, and “escalation policies” enable implementation of automated responses to problems detected by Smarts root-cause analysis engines. OpsGenie is a cloud based service that provides rich alert notifications and mobile response capabilities.

Leveraging Smarts tools and escalation policies, OpsGenie extends Smarts’ root cause analysis capabilities into mobile users. When Smarts detects a critical problem that requires attention, OpsGenie notifies the users through multiple notification channels (SMS, mobile push, voice, etc.), and enables the recipients to view the alert directly from their mobile devices. Here is how it works:

Read more »

Splunk alerts on your iPhone with OpsGenie

Jul 19, 2012 by Berkay Mollamustafaoğlu

Starting with version 4.2, Splunk provides alerting not only by polling and running searches on a scheduled basis but also in real-time. In the previous blog post, I had discussed the benefits of integrating Splunk and OpsGenie. In this post, I'll go over the use case of sending Splunk alerts to iPhone via push notifications as an example. Here are the steps:

Read more »

Splunking When You Are Mobile

Jul 16, 2012 by Berkay Mollamustafaoğlu

Splunk is fast establishing itself as one of the must have tools for IT operations. Organizations use Splunk to consolidate machine data into a single searchable repository. Splunk provides an easy to use interface that allows users to analyze and correlate the collected data. And with the latest release Splunk now has alerting capabilities where alerts can be generated for saved searches in real-time.

OpsGenie leverages Splunk alerting and extends Splunk's capabilities into mobile devices, making operational insights driven from Splunk available to uses even when user are mobile. When Splunk detects an incident that requires attention, OpsGenie notifies the users through multiple notification channels, and enables users to view the alert directly from their mobile devices. Here is how it works:

Read more »

Lamp command line utility released!

Jul 6, 2012 by Berkay Mollamustafaoğlu

OpsGenie has a simple Web API to interact with OpsGenie from any programming language that can make web requests. Today, we've released lamp, a command line utility to do the same. Lamp uses OpsGenie Web API under the hood and provides capabilities to create & close alerts, attach files, etc. easily from shell scripts. Lamp is a Java application, hence works on any platform that has a JVM.

Read more »
1 2 3 4