Assessing the Results of Monitor Timeout Validations

In my last post, I configured the World Bank BRIC API monitor such that if any of the individual calls to the World Bank Countries API takes more than 350 milliseconds to complete, that monitor step fails. The reason for doing this is to enable us to assess how frequently the World Bank Countries API has a slow response; rather than make our customer wait for seconds, it might be better if our own product promptly responded with a message like “Information for Brazil is not currently available.”

Not long after I set the monitor running, its summary page showed two consecutive failed validations:

wb-bric-two-failures

Note that the World Bank Countries API returned HTTP Status Code 200 (OK) for all four calls. That is, the request was received, and the appropriate response data was sent back to the client. However, our monitor’s status is “Failure” in both cases. And when we scroll to the top of the page, we see that the monitor’s current status is “Down”:

wb-bric-down

Clicking the time stamps in the “Recent Checks” table brings us to the detail page for each check. Here, we can see the results of each of the four calls to the World Bank Countries API.

For the check performed at 15:59:47, the timings for the individual monitor steps were:

  1. Brazil – 185.99 msec (Pass)
  2. Russia – 469.92 msec (Failed)
  3. India- 482.91 msec (Failed)
  4. China – 469.79 msec (Failed)

Here are the details for Step 4 (China):

wb-bric-china-failed

The timing details for the check that occurred at 15:54:33 are:

  1. Brazil – 487.13 msec (Failed)
  2. Russia – 477.82 msec (Failed)
  3. India – 476.09 msec (Failed)
  4. China – 804.58 msec (Failed)

The timing breakdown for the 804.58 msec call for the information on China was as follows:

  • Resolve Time: 3.47 msec
  • Connect Time: 81.53 msec
  • Processing Time: 719.5 msec
  • Transfer Time: 0.09 msec

wb-bric-china-timing-breakdown

This shows that the slow response was do to something happening in the World Bank’s data center. The monitor resolved the API endpoint URL in 3.47 msec. The time to establish a connection between the monitor client and the World Bank’s server was 81.53 msec, which is typical. The transfer time during which the World Bank’s server streamed its response text back to the client was 0.09 msec (a very fast transfer).

But, for whatever reason, once the connection with World Bank’s server was established, it took 719.5 msec for the server to analyze the request, locate the required information, create the response text, and signal the client that it was about to send the first byte of response data (all of this is what makes up “Processing Time”).

No information is available to a calling client about why calls to the World Bank Countries API took an unusually long time to process requests during this period. A more typical processing time for these calls is about 92 msec. In any case, if your application uses this data, then you’ll want to know when delays like this occur–no matter what time it is, or what day of the week.

API Science Alerts

One way to do this is to utilize API Science alerts. At the bottom of the summary page for a monitor, there is an “Alert Rules” section that lets you configure alerts that are issued when an API monitor fails (and, optionally, when the checks return to “Pass” status).

Here’s the status of the alert I configured for the World Bank BRIC monitor:

wb-bric-alerts

In addition to email alerts, the API Science platform can send an alert to a URL or to a PagerDuty account.

–Kevin Farnham