Integrating API Monitoring into Your Product’s Operational Workflow

You’ve got a product that requires many nines of uptime. If an API (external or internal) that your product requires is down, your own product is down, or key aspects of it are down.

You are monitoring the APIs that are key for your product, and have alerts configured so that your development and QA teams are alerted when a critical issue arises. But, when they receive their alerts, the data they have available for their analysis of the problem will be based on your normal cadence of your API monitoring. Once your team receives the alert that your product is experiencing a problem, their latest available information could be old, as old as your API monitoring cadence.

Is there a better solution? Yes, if your API monitoring provider has an API that provides the capability for automatically updating your monitors based on the results they report. The API Science API provides this capability.

Evaluating API Monitor Checks Using the API Science API

Your API monitoring team can be alerted when an API that’s critical for your product goes down using API Science’s alerts capability, which provides the ability to notify your team about problems using email, a URL, Slack, PagerDuty, and HipChat. These methods can be extended using tools like IFTTT (IF This Then That) such that your monitor alerts can be broadcast to virtually any social media platform (for example, Todoist or Facebook).

While notifying your team is important when a critical situation arises, you can also utilize the API Science platform to develop custom tools that automate analysis of your monitor results. For example, the Checks component of the API allows you to request information about a monitor’s checks. The API reference describes checks as follows:

Checks represent a single run of your monitor. They contain detailed information about the actual API calls that took place.

A shell script that issues a curl command can be used to request information about monitor checks. For example:

curl 'https://api.apiscience.com/v1/monitors/1572022/checks.json'
    -H 'Authorization: Bearer MY_API_KEY'

Here, I’m requesting the checks information for my “br_Ireland” monitor, which queries the World Bank Countries API for information about Brazil, with the query running from a server located in Ireland. This is an HTTP GET request to the World Bank API endpoint:

http://api.worldbank.org/countries/br

“br” is the World Bank country code for Brazil.

Using the API Science Checks API, I can gather the results of checks for my br_Ireland monitor, and analyze them using my own custom software.

A query to the Checks API for a successful check for my br_Ireland monitor produces a JSON response that looks like this:

{
  "meta": {
    "status": "success",
    "numberOfResults": 1
  },
  "data": [
    {
      "id": 118234601,
      "href": "https://api.apiscience.com/v1/checks/118234601.json",
      "status": "success",
      "statistics": {
        "resolve": 140.09,
        "connect": 75.38,
        "processing": 85.44,
        "transfer": 0.13,
        "total": 301.04,
        "downloadSize": 680
      },
      "monitor": {
        "href": "https://api.apiscience.com/v1/monitors/1572022.json",
        "name": "br Ireland"
      },
      "calls": [
        {
          "id": 138054314,
          "template": {
            "id": 1324667,
            "href": "https://api.apiscience.com/v1/monitors/1572022/templates/1324667.json",
            "name": null
          },
          "statistics": {
            "resolve": 140.09,
            "connect": 75.38,
            "processing": 85.44,
            "transfer": 0.13,
            "total": 301.04,
            "downloadSize": 680
          },
          "request": {
            "verb": "GET",
            "url": "http://api.worldbank.org/countries/br",
            "headers": [

            ],
            "body": "",
            "params": [

            ]
          },
          "response": {
            "contentType": "text/xml; charset=UTF-8",
            "stausCode": 200,
            "headers": [
              {
                "Date": "Thu, 23 Mar 2017 21:33:03 GMT"
              },
              {
                "Content-Type": "text/xml; charset=UTF-8"
              },
              {
                "Content-Length": "680"
              },
              {
                "Connection": "keep-alive"
              },
              {
                "X-Powered-By": "ASP.NET"
              },
              {
                "Set-Cookie": "TS01fa65e4=01359ee976b447c7e9670af2f161386dbb4d4d96bbd953b4ce5bc82c29722819bbd9b2f38e; Path=/"
              },
              {
                "Server": "Apigee Router"
              }
            ],
            "body": "\r\n\r\n  \r\n    BR\r\n    Brazil\r\n    Latin America & Caribbean \r\n    Latin America & Caribbean (excluding high income)\r\n    Upper middle income\r\n    IBRD\r\n    Brasilia\r\n    -47.9292\r\n    -15.7801\r\n  \r\n"
          },
          "validationFailures": [

          ],
          "createdAt": "2017-03-23T21:33:03.000Z",
          "updatedAt": "2017-03-23T21:33:04.000Z"
        }
      ],
      "alerts": [

      ],
      "createdAt": "2017-03-23T21:33:04.000Z",
      "updatedAt": "2017-03-23T21:33:04.000Z"
    }
  ],
  "pagination": {My last two posts described how you can use the API Science API to evaluate API checks and detect failed checks. 
    "last": "https://api.apiscience.com/v1/monitors/1572022/checks.json?start=118234601&page=173&count=1",
    "next": "https://api.apiscience.com/v1/monitors/1572022/checks.json?start=118234601&page=2&count=1"
  }
}

Interpreting these Results

This is a lot of data relating to a single check performed by my br_Ireland monitor. Let’s take a look at what this is showing us. To do this, I’ll describe what the primary JSON data elements in this API response represent.

The “meta” section at the top shows that the request to the API Science Checks API was successful, and it produced one check result for my br_Ireland monitor.

The “data” section contains the information about this specific br_Ireland check. This includes summary details about the check, including the curl timing statistics and the status of the check. In this example, the check was successful.

The “monitor” section identifies which monitor my request interrogated (i.e., “br_Ireland”).

The subsequent “calls” section shows the details of the br_Ireland call that was made during this check. We see timing statistics, the HTTP “verb” (“GET”), the URL that was called, and any headers or body information included in the API call to the br_Ireland API.

The “response” subsection provides the details of what was returned by the request. The “body” section shows the data that was returned. In this example, the “validationFailures” field indicates that none of my API Science validations for this monitor failed. The “alerts” section shows that no alerts were sent out to my API monitoring team due to the results of this check.

Thus, the “calls” section presents all data that was sent to the World Bank API in this particular br_Ireland API check, along with the resultant response data.

With this data in hand, we can proceed with developing customized responses to events that occur in the APIs that are crucial for our own product.

Detecting Failed API Monitor Checks Using the API Science API

My “br_Ireland” monitor has a validation setting that requires an HTTP Response code of “200” in order for the API check to be considered successful:

Thus, if a monitor check does not return an HTTP response code of 200, the check is considered to have failed, and the API is considered down with respect to its utility for our product. If this is a critical API for our product, then our own product will appear down to our customers.

Here is an example JSON response for a call to the Checks API for a failed “br Ireland” check (the “body” section is truncated to improve readability):

{
  "meta": {
    "status": "success",
    "numberOfResults": 1
  },
  "data": [
    {
      "id": 121102116,
      "href": "https://api.apiscience.com/v1/checks/121102116.json",
      "status": "failure",
      "statistics": {
        "resolve": 50.06,
        "connect": 0.96,
        "processing": 27.14,
        "transfer": 0.95,
        "total": 79.11,
        "downloadSize": 1565
      },
      "monitor": {
        "href": "https://api.apiscience.com/v1/monitors/1572022.json",
        "name": "br Ireland"
      },
      "calls": [
        {
          "id": 141355405,
          "template": {
            "id": 1324667,
            "href": "https://api.apiscience.com/v1/monitors/1572022/templates/1324667.json",
            "name": null
          },
          "statistics": {
            "resolve": 50.06,
            "connect": 0.96,
            "processing": 27.14,
            "transfer": 0.95,
            "total": 79.11,
            "downloadSize": 1565
          },
          "request": {
            "verb": "GET",
            "url": "http://api.worldbank.org/countries_api/br",
            "headers": [

            ],
            "body": "",
            "params": [

            ]
          },
          "response": {
            "contentType": "text/html; charset=UTF-8",
            "stausCode": 404,
            "headers": [
              {
                "Date": "Tue, 04 Apr 2017 03:57:09 GMT"
              },
              {
                "Content-Type": "text/html; charset=UTF-8"
              },
              {
                "Content-Length": "1565"
              },
              {
                "Connection": "keep-alive"
              },
              {
                "X-Powered-By": "ASP.NET"
              },
              {
                "Set-Cookie": "TS019266c8=017189f947e2cb40b23c9c04eec31cf2f670ff2827dd4f5157e821f795b176569408155f5b; Path=/"My last two posts described how you can use the API Science API to evaluate API checks and detect failed checks. 
              },
              {
                "Server": "Apigee Router"
              }
            ],
            "body": ... 
"Service ...
Endpoint not found." ...
          },
          "validationFailures": [
            {
              "kind": "Response Code",
              "message": "Response code is not 200"
            }
          ],
          "createdAt": "2017-04-04T03:57:09.000Z",
          "updatedAt": "2017-04-04T03:57:09.000Z"
        }
      ],
      "alerts": [

      ],
      "createdAt": "2017-04-04T03:57:09.000Z",
      "updatedAt": "2017-04-04T03:57:09.000Z"
    }
  ],
  "pagination": {
    "last": "https://api.apiscience.com/v1/monitors/1572022/checks.json?start=121102116&page=179&count=1",
    "next": "https://api.apiscience.com/v1/monitors/1572022/checks.json?start=121102116&page=2&count=1"
  }
}

The response has the same structure as the result I discussed above. The “meta” section shows that the call to the Checks API was successful. The “data” section includes statistics about this particular “br Ireland” check. The “status” in this case was “failure”, whereas the “status” for the previous result was “success”.

This reveals the possibility for your team to develop automated procedures based on the results that calls to the API Science Checks API return. Assume your product depends on the information the World Bank Countries API returns for Brazil. You have an API monitor checking this API at some time cadence (every 15 minutes, for example), and these monitors can send alerts to your team when a critical API goes down. But, using the Checks API response as your input data, you can develop custom software that will automatically react to critical API outages.

For example, you could create an application that parses the Checks API response data to determine the significance of an outage for your product (is the API completely down? is it not returning a complete response? is it stating the information you requested is not currently available?). Or, you could use the response data to assess lags in the response times for important internal or external APIs (if you guarantee your customers a response within 3 seconds, but a critical API has frequently taken 30 seconds to respond over the past few hours, action on your part my be needed: your customers think your app is failing!)…

The API Science Checks API endpoint provides data that enables your team to develop custom software to analyze the performance of critical external or internal APIs that determine how your customers view your product.

Using Tags to Group API Monitors

Many online products require gathering information from many different data sources. If your product queries multiple APIs (external or internal), then it is vital for your team to monitor those APIs in order to assess whether your product appears up, partially up, or down.

Now, I’ll describe API monitor tags and how they can fit into your overall performance analysis strategy.

API monitor tags:

are optional, user-defined labels that can be associated with any Monitor. Tags allow the sorting, grouping, and running of mass actions on monitors. Each monitor can have any number of tags.

For example, assume your product will be seen as “down” by your customers, should various calls to external or internal APIs not produce a valid result. In this case, you might want to adjust your settings for monitoring those APIs so you receive new updates at an increased cadence, testing the APIs at smaller time intervals to better assess the status of your product as it appears to users.

API Science provides the capability to link different monitors using API monitor tags. Using these tags, your software can execute a batch update of all your similarly-tagged API monitors via the API Science API.

To tag a monitor, go to your Dashboard and click the name of the monitor you’d like to tag. For example, I can click on my “br Ireland” monitor in my Dashboard’s “NAME” column:

On my “br Ireland” monitor page, I click the “Edit” button:

Near the bottom of this page there is a “Tags” box. Here, you can enter one or more text items that will identify this monitor as belonging to a certain category that is of interest for your platform. This provides a way to group monitors that are relevant to your overall product, or a component of your product.

Here, I define this monitor as having the tag “br”:

Assume my product requires accessing the World Bank’s Countries API, and I want to be assured that my customers, wherever they may reside, are finding my product to be available. If this is not the case, then it is beneficial to alter my monitoring for multiple APIs.

The API Science monitor tagging facility enables me to tag any number of monitors with the same tag. I can apply the same “br” tag to all monitors that are critical for this product.

Tagging multiple monitors with the same tag name makes it possible to execute batch alterations to multiple monitors using the API Science API.

Modifying Groups of API Monitors Using Tags

If you’ve integrated the monitoring of your product’s performance and availability using the API Science API, it’s also possible to take the next step of modifying your monitors based on the conditions that are demonstrated by your API Science integration. That is, API Science’s API provides the capability not only for you to monitor the APIs that are critical for your product to be seen as “up” by your customers; it also provides the capability for you to respond to outages programmatically, rather than solely via messages forwarded to your quality assurance and/or development team.

Consider a situation where one or more APIs that provide key information for your product become unresponsive. You might be monitoring these APIs on an hourly basis, because normally they are quite reliable. If you’re using the API Science API to detect failures, your software will notice that something has changed, that one or more APIs had a failed check.

Using the API Science notifications facility, you can convey this information to your team. However, your software can also immediately respond to the failure and potentially provide your team with valuable additional information by utilizing API Science’s apply actions to multiple monitors capability.

When something changes that could affect how your customers perceive your product, there are multiple possibilities as to why that happened. There could have been a short-term outage for one or more critical APIs that your product depends on. In this case, your product will soon return to its normal state.

But if there is a lingering outage for your product’s critical APIs, then you’ll want to monitor the situation more closely. You’ll likely want to increase the frequency with which you monitor these critical APIs, so you can determine when their outages cease.

During the interval when the APIs are experiencing outages or incomplete performance, your product’s software must inform your customers that some information that’s normally a component of your product is either unavailable or out of date. By increasing the frequency with which you monitor the critical APIs, you can enable your software to detect when the situation is resolved, to the benefit of both your customers and your 24/7 QA/Developer team.

Altering API Monitor Frequencies

If a problem is detected, you may want to shorten the frequency of your API monitoring for critical APIs. If you’ve configured your related API monitors with tags, then your software can alter the frequency with which multiple monitors initiate checks using a curl command like this:

curl 'https://api.apiscience.com/v1/monitors?tags=your_tag' -X PATCH 
-H 'Content-Type: application/json' 
-H 'Authorization: Bearer NN_your_bearer_code' 
--data-binary '{ "frequency" : 10 }'

Here, we are using curl to send a command to the API Science API that will alter the frequency of all API monitors tagged with “your_tag”.

The API Science API enables you to programmatically assess the state of APIs that are critical for your product to be perceived as “up” by your customers. You can automatically alter your monitoring frequencies and other parameters, to provide your QA/Developer teams with the information they most need to address new performance issues as they occur

Conclusion

Integrating your product’s operational software with the API Science platform can facilitate increased responsiveness to downtime issues by automating increased cadence of API monitoring as problems occur, providing your developer and QA teams with substantially more timely and detailed information about issues relating to APIs that are critical to your product as they unfold.

–Kevin Farnham