SYS-CON MEDIA Authors: Pat Romanski, Yeshim Deniz, Elizabeth White, Liz McMillan, William Schmarzo

Blog Feed Post

Where you Rate-Limit APIs Matters

Seriously, let’s talk about this because architecture is a pretty important piece of the scalability puzzle.

 

Rate limiting is not a new concept. We used to call it “quality of service” to make it sound nicer, but the reality is that when you limited bandwidth availability based on application port or protocol, you were rate limiting.

Today we have to apply that principle to APIs (which are almost always RESTful HTTP requests) because just as there was limited bandwidth in the network back in the day, there are limited resources available to any given server. This is not computer science 301, so I won’t dive into the system-level details as to why it is TCP sockets are tied to file handles and thus limit the number of concurrent connections available. Nor will I digress into the gory details of memory management and CPU scheduling algorithms that ultimately support the truth of operational axiom #2 – as load increases, performance decreases.

Anyone who has tried to do anything on a very loaded system has experienced this truth.

So, now we’ve got modern app architectures that rely primarily on APIs. Whether invoked from a native mobile or web-based client, APIs are the way we exchange data these days. We scale APIs like we scale most HTTP-based resources. We stick a load balancer in front of two or more servers and algorithmically determine how to distribute requests. It works, after all. That can be seen every day across the Internet. Chances are if you’re doing anything with an app, it’s been touched by a load balancer.

Now, I mention that because it’ll be important later. Right now, let’s look a bit closer at API rate limiting.

The way API rate limiting works in general is that each client is allowed X requests per time_interval. The time interval might bewhy do we scale minutes, hours, or days. It might even be seconds. The reason for this is to prevent any given client (user) from consuming so many resources (memory, CPU, database) as to prevent the system from responding to other users.

It’s an attempt to keep the server from being overwhelmed and falling over.

That’s why we scale.

The way API rate limiting is often implemented is that the app, upon receiving a request, checks with a service (or directly with a data source) to figure out whether or not this request should be fulfilled or not based on user-defined quotas and current usage.

This is the part where I let awkward silence fill the room while you consider the implication of the statement.

In an attempt to keep from overwhelming servers with API requests, that same server is tasked with determining whether or not the request should be fulfilled or not.

Now, I know that many API rate limiting strategies are used solely to keep data sources from being overwhelmed. Servers, after all, scale much easier than their database counterparts.

Still, you’re consuming resources on a server unnecessarily. You’re also incurring some pretty heavy architectural debt by coupling metering and processing logic together (part of the argument for microservices and decomposition but that’s another post) and making it very difficult to change how that rate limiting is enforced in the future. Because it’s coupled with the app.

If you recall back to the beginning of this post, I mentioned there is almost always (I’d be willing to bet on it) a load balancer upstream from the servers in question. It is upstream logically and often physically, too, and it is, by its nature, capable of managing many, many, many more connections (sockets) than a web server. Because that’s what they’re designed to do.

So if you moved the rate limiting logic from the server to the load balancer…  you get back resources and reduce architectural debt and ensure some agility in case you want to rapidly change rate limiting logic in the future. After all, changing that logic in 1 or 2 instances of a load balancer is far less disruptive than making code changes to the app (and all the testing and verification and scheduling that may require).

Now, as noted in this article laying out “Best Practices for a Pragmatic RESTful API” there are no standards for API rate limiting. There are, however, suggested best practices and conventions that revolve around the use of custom HTTP headers:

At a minimum, include the following headers (using Twitter's naming conventions as headers typically don't have mid-word capitalization):

  • X-Rate-Limit-Limit - The number of allowed requests in the current period
  • X-Rate-Limit-Remaining - The number of remaining requests in the current period
  • X-Rate-Limit-Reset - The number of seconds left in the current period

And of course when a client has reached the limit, be sure to respond with HTTP status code 429 Too Many Requests, which was introduced in RFC 6585.

Now, if you’ve got a smart load balancer; one that is capable of actually interacting with requests and responses (not just URIs or pre-defined headers, but one that can actually reach all the way into the TCP payload, if you want) and is enabled with some sort of scripting language (like TCL or node.js) then you can move API rate limiting logic to a load balancer-hosted service and stop consuming valuable compute.

Inserting custom headers using node.js (as we might if we were using iRules LX on a BIG-IP load balancing service) is pretty simple. The following is not actual code (I mean it is, but it’s not something I’ve tested). This is just an example of how you can grab limits (from a database, a file, another service) and then insert those into custom headers.

  1: limits = api_user_limit_lookup(); 
  2: req.headers["X-Rate-Limit-Limit"] = limits.limit; 
  3: req.headers["X-Rate-Limit-Remaining"] = limits.remaining;
  4: req.headers["X-Rate-Limit-Reset"] = limits.resettime; 

You can also simply refuse to fulfill the request and return the suggested HTTP status code (or any other, if your app is expecting something else). You can also send back a response with a JSON payload that contains the same information. As long as you’ve got an agreed upon method of informing the client, you can pretty much make this API rate limiting service do what you want.

Why in the network? 

 

There are three good reasons why you should move API rate limiting logic upstream, into the load balancing proxy:

1. Eliminates technical debt

     If you’ve got rate limiting logic coupled in with app logic, you’ve got technical debt you don’t need. You can lift and shift that debt
     upstream without having to worry about how changes in rate limiting strategy will impact the app.

2. Efficiency gains

     You’re offloading logic upstream, which means all your compute resources are dedicated to compute. You can better predict
     capacity needs and scale without having to compensate for requests that are unequal consumers.

3. Security

     It’s well understood that application layer (request-response) attacks are on the rise, including denial of service. By leveraging an
     upstream proxy with greater capacity for connections you can stop those attacks in their tracks, because they never get anywhere
     near the actual server.

 

Like almost all things app and API today, architecture matters more than algorithms. Where you execute logic matters in the bigger scheme of performance, security, and scale.

Read the original blog entry...

More Stories By Lori MacVittie

Lori MacVittie is responsible for education and evangelism of application services available across F5’s entire product suite. Her role includes authorship of technical materials and participation in a number of community-based forums and industry standards organizations, among other efforts. MacVittie has extensive programming experience as an application architect, as well as network and systems development and administration expertise. Prior to joining F5, MacVittie was an award-winning Senior Technology Editor at Network Computing Magazine, where she conducted product research and evaluation focused on integration with application and network architectures, and authored articles on a variety of topics aimed at IT professionals. Her most recent area of focus included SOA-related products and architectures. She holds a B.S. in Information and Computing Science from the University of Wisconsin at Green Bay, and an M.S. in Computer Science from Nova Southeastern University.

Latest Stories
Despite being the market leader, we recognized the need to transform and reinvent our business at Dynatrace, before someone else disrupted the market. Over the course of three years, we changed everything - our technology, our culture and our brand image. In this session we'll discuss how we navigated through our own innovator's dilemma, and share takeaways from our experience that you can apply to your own organization.
DXWorldEXPO LLC announced today that Nutanix has been named "Platinum Sponsor" of CloudEXPO | DevOpsSUMMIT | DXWorldEXPO New York, which will take place November 12-13, 2018 in New York City. Nutanix makes infrastructure invisible, elevating IT to focus on the applications and services that power their business. The Nutanix Enterprise Cloud Platform blends web-scale engineering and consumer-grade design to natively converge server, storage, virtualization and networking into a resilient, softwar...
Founded in 2002 and headquartered in Chicago, Nexum® takes a comprehensive approach to security. Nexum approaches business with one simple statement: “Do what’s right for the customer and success will follow.” Nexum helps you mitigate risks, protect your data, increase business continuity and meet your unique business objectives by: Detecting and preventing network threats, intrusions and disruptions Equipping you with the information, tools, training and resources you need to effectively m...
Having been in the web hosting industry since 2002, dhosting has gained a great deal of experience while working on a wide range of projects. This experience has enabled the company to develop our amazing new product, which they are now excited to present! Among dHosting's greatest achievements, they can include the development of their own hosting panel, the building of their fully redundant server system, and the creation of dhHosting's unique product, Dynamic Edge.
The Transparent Cloud-computing Consortium (T-Cloud) is a neutral organization for researching new computing models and business opportunities in IoT era. In his session, Ikuo Nakagawa, Co-Founder and Board Member at Transparent Cloud Computing Consortium, will introduce the big change toward the "connected-economy" in the digital age. He'll introduce and describe some leading-edge business cases from his original points of view, and discuss models & strategies in the connected-economy. Nowad...
"DevOps is set to be one of the most profound disruptions to hit IT in decades," said Andi Mann. "It is a natural extension of cloud computing, and I have seen both firsthand and in independent research the fantastic results DevOps delivers. So I am excited to help the great team at @DevOpsSUMMIT and CloudEXPO tell the world how they can leverage this emerging disruptive trend."
For far too long technology teams have lived in siloes. Not only physical siloes, but cultural siloes pushed by competing objectives. This includes informational siloes where business users require one set of data and tech teams require different data. DevOps intends to bridge these gaps to make tech driven operations more aligned and efficient.
NanoVMs is the only production ready unikernel infrastructure solution on the market today. Unikernels prevent server intrusions by isolating applications to one virtual machine with no users, no shells and no way to run other programs on them. Unikernels run faster and are lighter than even docker containers.
CloudEXPO | DevOpsSUMMIT | DXWorldEXPO Silicon Valley 2019 will cover all of these tools, with the most comprehensive program and with 222 rockstar speakers throughout our industry presenting 22 Keynotes and General Sessions, 250 Breakout Sessions along 10 Tracks, as well as our signature Power Panels. Our Expo Floor will bring together the leading global 200 companies throughout the world of Cloud Computing, DevOps, IoT, Smart Cities, FinTech, Digital Transformation, and all they entail. As ...
Darktrace is the world's leading AI company for cyber security. Created by mathematicians from the University of Cambridge, Darktrace's Enterprise Immune System is the first non-consumer application of machine learning to work at scale, across all network types, from physical, virtualized, and cloud, through to IoT and industrial control systems. Installed as a self-configuring cyber defense platform, Darktrace continuously learns what is ‘normal' for all devices and users, updating its understa...
Digital Transformation (DX) is a major focus with the introduction of DXWorldEXPO within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throughout enterprises of all sizes. We are offering early bird savings...
SUSE is a German-based, multinational, open-source software company that develops and sells Linux products to business customers. Founded in 1992, it was the first company to market Linux for the enterprise. Founded in 1992, SUSE is the world’s first provider of an Enterprise Linux distribution. Today, thousands of businesses worldwide rely on SUSE for their mission-critical computing and IT management needs.
The dream is universal: heuristic driven, global business operations without interruption so that nobody has to wake up at 4am to solve a problem. Building upon Nutanix Acropolis software defined storage, virtualization, and networking platform, Mark will demonstrate business lifecycle automation with freedom of choice and consumption models. Hybrid cloud applications and operations are controllable by the Nutanix Prism control plane with Calm automation, which can weave together the following: ...
Crosscode Panoptics Automated Enterprise Architecture Software. Application Discovery and Dependency Mapping. Automatically generate a powerful enterprise-wide map of your organization's IT assets down to the code level. Enterprise Impact Assessment. Automatically analyze the impact, to every asset in the enterprise down to the code level. Automated IT Governance Software. Create rules and alerts based on code level insights, including security issues, to automate governance. Enterpr...
Your job is mostly boring. Many of the IT operations tasks you perform on a day-to-day basis are repetitive and dull. Utilizing automation can improve your work life, automating away the drudgery and embracing the passion for technology that got you started in the first place. In this presentation, I'll talk about what automation is, and how to approach implementing it in the context of IT Operations. Ned will discuss keys to success in the long term and include practical real-world examples. Ge...