How One Slow External API Can Bring Your Product Down

APIs receive a request from a user, gather data and/or perform some processing based on the request, then return the requested information (or other result) to the user. This sounds simple. But a lot can go wrong, and even when everything succeeds, the time it takes to complete the process can be highly variable.

Greater complexity for processing a request multiplies the probability that something will go wrong, or that a performance bottleneck will occur somewhere along the processing path. When this happens, your customer is stuck waiting for a response. If the wait is too long, and happens too frequently, they will decide your product doesn’t work, and they’ll move on to your competitors.

“But wait!” you say. “It wasn’t our API’s fault, it was the XYZ API that we use as a data source–they’re the ones who are experiencing the problem!” Which falls on deaf ears, because the customer literally has already pointed their device elsewhere.

This exemplifies why you need to monitor the critical calls to both external and internal APIs that are involved in processing your most important and frequent customer requests. Among the things you need to study are performance aberrations for the API calls that compose your processing: if a particular API call in your processing sequence takes too long, it is likely better to return a message saying “this particular information is not available at present” than to let your customer become frustrated awaiting your response.

In a recent blog post, I described how to create a multi-step API monitor, a monitor that gathers information about the BRIC (Brazil, Russia, India, China) nations from the World Bank Countries API. Here’s the current summary view for that monitor:

wb-bric-sum-20160621

The most recent check is an anomaly. The previous 9 API checks took around 700-900 msec to complete, but the most recent check took about three times as long. The color bars indicate that the delay occurred in the “Processing” step, that is, the time between when the server receives the request and when it signals that it’s ready to send the first response byte.

The World Bank BRIC monitor has 4 steps, so the next stage of investigation is to scroll down the page and click on the timestamp for the latest check:

wb-bric-recent

This brings us to the detail page for the check that occurred at 06/21/16 14:00. Since this is a multi-step monitor, the page uses tabs: each tab provides the low-level details for one of the steps:

wb-bric-br-20160621

The total processing time for Step 1, which retrieves the data for Brazil, was 278 msec. Since this is a small portion of the 2388 msec total time for full multi-step sequence, the problem must have occurred elsewhere.

Clicking “Step 2” shows us that the time for getting the data for Russia was 832 msec:

wb-bric-ru-20160621

Clicking “Step 3” shows that the total time for retrieving the data for India was 472 msec. Getting the data for China (Step 4) took 806 msec.

Conclusion

The data explains what happened. We cannot know why the World Bank’s Countries API experienced slow performance for certain calls on 06/21/16 just after 14:00; but, it happened.

The question that remains is: what to do about it? Will a 2.5 second delay annoy your customers? What if the delay approaches 10 seconds? This happened with this same monitor earlier in the day:

wb-bric-10sec-detail

The API Science platform provides tools (including response time) for validating each step in an API monitor’s processing sequence. I’ll delve into the details of how you can validate your API Science monitor calls in upcoming posts.

–Kevin Farnham