Monitoring statuspages

There are many statuspages that you might want to keep an eye on if you are into monitoring and want to inform your users if certain sites has issues and does not work as expected. I am using Check Mk 1.6 to do most of my monitoring. In most cases I am monitoring local servers to make sure servers are up and running and perform well. This works great out of the box. However there are also many other cloudservices that you might use like dropbox, slack, egnyte and others. In 99% of the cases the company provides statuspages formated like status.slack.com. This is good news but often you will have to visit this site just to get the information what might be wrong. In many cases they provide atom/rss feeds or Slack notifications but not always.

But what if you could get the status information right into your monitoring system? Good news, you can.

 

The market for statuspages is not enormous. There are some big players, some smaller ones and also some opensource. When I have checked some it seems like most are using statuspage.io, sorryapp.com, freshstatus and some other.

If you want to monitor Slack to make sure Slack is up and running it would not work to ping status.slack.com as this page is always online and it does not tell you anything about what is actually wrong. You might want to try to ping some other ip addresses or urls as well but most likely they will change quite often.

You can use an online tool like https://builtwith.com/status.dropbox.com to check what statuspage a site is using. In this example you will see that Dropbox is using statuspage.io.

Below I will show you how to check Slack. I have not been able to figure out what software they are using but still it does not matter as you could get this working anyway.

You might want to start by checking a site and view the page source. Find the service like Messaging and the status No issues in the text. We will use regex so you will have to count the characters including spaces between. I have used the plugin TextFX and Notepad++ to do this.

I have noticed that this gives you an indication but most often it will not be exact. So use this to get a number to start with. If you are running Check MK you can login as your site user and run this command:

~/lib/nagios/plugins/check_http --ssl --onredirect=follow -l -r 'Messaging.{38}No issues' -H status.slack.com --sni status.slack.com

Try some numbers until you get the correct response.

Slack is using this regex:
Regex: Messaging.{38}No issues

No create a host called status.slack.com in Check MK or Nagios. Add a HTTP rule to check the site with http. Make sure it is like in the example below.

You could skip the step by using the command but you would have to try all different numbers until you get a OK response. Now clone this rule and make rules for all services you might want to check. You will end up with something like this.

Now continue to add more hosts. I have used status.lifesizecloud.com, status.meistertask.com and status.egnyte.com.

Lifesize and Egnyte uses https://www.statuspage.io
Regex: Sync.{73}Operational

There are some issues with lifesizecloud as they are using name ids in the code. I have not really figured this out yet.

Meistertask is using sorryapp.com
Regex: Web Application.{538}Operational

There are some downsides when you use this method. If the company deciedes to change statuspage it will stop working. The same would happen if they move anything around on the site as the number of characters may change. However my experience is that the main statuspage seems to be quite static.

In Check MK 1.6 they introduced a graphical display of business intelligence. It will visualize any aggregations you are using. This is quite cool and I will show you how I have used the statuspage integration to keep an eye if your services is up and running. You can read more about setting this up here. First start by creating something and later on add the status hosts.

The visualization will look like this. Now everything is green but in the case that Slack has problems with their API the main aggregation Spaces will turn red. I can also set rules here so that Spaces will not be red unless 3 or more child services are down.

I will go into more in detail with this in a later post.