Why Does Tail Latency Matter?

It’s a complicated question, actually: latency versus throughput. Throughput is how many responses you can give to your varied customers in a certain amount of time. Latency is the amount of time it took for an individual customer to receive the response to their request. If your product is going to succeed, obviously you need both excellent throughput and low latency: you need to be able to simultaneously provide your product to a large customer base and ensure that your customers receive what they’ve requested as quickly as possible.

The “quickly as possible” part of the equation involves trade-offs, and it also involves unpredictable matters that are beyond any company’s control. If the network path between your server and a user’s device is temporarily blocked or slowed by issues you have no control over (such as communications delays between local computers and the wide range of servers that provide information that is essential for the creation of your product), then your users may sometimes experience a very slow response when they run your app.

Here’s a graph of the actual performance of a call to an API as measured by one of my API Science monitors, sorted by percentile:

The orange line is the mean performance.

Now, there are a lot of positives here. The average performance is 257 milliseconds, and more than 80% of the time your performance is below the mean. Typically, your customer sees a sufficiently fast response.

Sometimes your customer sees a very slower response; and once in a while they see a very, very slow response (more than an order of magnitude above the average). This is what is termed your app’s “tail latency”… and it can be (or become, especially if you are fortunate enough to experience a huge surge in users) a very big problem.

The plot above illustrates that calls to this API exhibit quite a bit of tail latency. Let’s take a closer look at what’s happening at the high percentiles:

It’s easy to see that 90% of the time, or perhaps even 97% of the time, the performance is pretty close to the mean. However, that means that up to 3% of the time the performance might feel slow to a user. And at the very tail, the top percentile, performance is terrible, an order of magnitude above the average–which is sure to be noticed by a user.

But we also have to look at what a frequent user experiences. A frequent user will become accustomed to your app’s typical performance. But up to 3% of the time, the response will be unexpectedly slow, and in some cases annoyingly slow. Annoying your customer is clearly not a good practice.

So what can you do when a critical API occasionally has slow response times? There are multiple options.

You could have your software just wait for the slow API to respond. Let your users be sometimes annoyed, and hope for the best. But what if that API goes completely down for an extended period of time? Your app needs to be up even if some component it depends on is down. Therefore, you must always have back-up logic in place for the case where data your app uses is temporarily unavailable. You might be able to use an archived version of the data; or your app might just indicate to users that that particular data isn’t currently available.

Whatever strategy you employ for the case when an API is unavailable can also be applied when an API takes too long to respond. Your software can time-out the API request and revert to your back-up strategy for that particular API. Then your customers will not experience a large performance delay, though your app will not necessarily be showing entirely up-to-date information.

The message is: you cannot control the API world you depend on for your livelihood; but you can control your response to the present API reality.

–Kevin Farnham