Analyzing the Causes of Slow API Call Performance

In recent posts I’ve been analyzing the data for a week of API monitoring from four different locations around the globe (Washington, DC, Oregon, Ireland, and Japan). The hourly performance for calls over the week looks like this:

Since the World Bank Countries API that’s being called is located in Washington, DC, it’s not surprising that calls from Washington take the least time to complete. Meanwhile, calls from Japan regularly take longer to complete than calls from cities located closer to Washington.

The data in the plot are based on metrics gathered by the curl Internet data transfer utility. This data is stored by API Science and is available for download using the API Science Performance Report API. See the post A Graphical View of API Performance Based on Call Location for more details.

What is plotted above is curl’s time_total metric. This is the total number of milliseconds from when curl received it’s URL and when it received the API’s response. Curl computes additional metrics that break down the performance into components (see the post What Do Curl Timings Mean?). API Science uses the curl timing data to divide each call’s performance into the following categories:

  • Resolve Time: the time (msec) it took to look up the URL’s Internet Protocol (IP) address
  • Connect Time: the time (msec) it took to contact the server and establish a connection
  • Processing Time: the time (msec) it took for the API server to receive the request and create a response
  • Transfer Time: the time (msec) it took for the API server to transfer the response bytes to the calling system

Here’s an example plot of data for the past 10 API monitor checks:

Curl metrics accessed from the API Science API can be used to analyze performance issues. For example, by examining data for the highest peak for each color in the first figure, we can determine which curl component measure occupied most of the call time.

Here’s the curl data for the highest peak for the four calling locations:

Resolve Connect Processing Transfer Total
Washington DC 195.5 0.36 10.68 0.18 206.61
Oregon 133.7 79.3 867.64 0.14 1080.67
Ireland 85.8 74.59 2979.49 0.02 3139.9
Japan 196.93 163.94 1063.87 0.16 1424.89

The Washington DC row shows that Resolve Time occupied 195.5 msec out of the total 206.61 msec it took to complete the call. It could be that a slow resolve will often be the lagging component when a call from Washington DC to Washington DC takes much longer than normal to complete, since the Connect, Processing, and Transfer all take place within that city.

Meanwhile, for the Oregon, Ireland, and Japan locations, Processing Time is the dominant factor in how long it took to complete the call.

This dichotomy raises questions that can only be answered through analysis of additional data. For example, is slow Processing Time the primary cause of slow API call performance in general? After all, it was the outlier curl component in three of the four cases. Might Washington DC simply not have experienced a very slow Processing Time during the week studied?

That might be the case, if Internet timings are far more consistent than the time it takes for an API server to process a request. Making an educated guess regarding this would require studying the curl component metric data for all locations over an extended period of time. The data is available in the API Science API. All that’s needed is a program to make the study.

–Kevin Farnham