Rate Limiter - Token Bucket
A rate limiter is a reliability pattern used in high-traffic systems. Its purpose is to control incoming request volume so your servers don’t get overwhelmed (or abused) under load.
There are several common rate-limiting strategies—such as token bucket, leaky bucket, and sliding window. In production, token bucket is one of the most popular choices because it’s simple, efficient, and allows short bursts while still enforcing an average rate.
Here’s the basic idea:
- A “bucket” is filled with tokens at a fixed rate (for example, 100 tokens per second).
- Each incoming request consumes one token.
- If a request arrives when the bucket is empty, the server rejects it with HTTP 429 (Too Many Requests).
Clients can typically retry later (often after waiting briefly), and rate limiters can be deployed in multiple scopes—for example per tenant, per API key, per service, or per endpoint—depending on what you’re trying to protect.
Since this blog is built with Rust, I decided to implement a simple rate limiter for this blog. Just for fun. We’ll walk through simple implementations in both Go and Rust.
Conceptual High level PseudoCode
- When the request comes, take the current timestamp
- Compare how much time has been elapsed since last request's timestamp
- Compute the time elapsed (delta_time)
- Compute the token to be filled: delta_time * desired_rate_of_request
- Add the new tokens to the bucket
Token Bucket RateLimiter Implementation in Golang:
package main
import (
"log"
"sync"
"time"
)
type TokenBucket struct {
mx sync.Mutex
maxTokens float64
tokens float64
rate float64
lastRefill time.Time
}
func NewTokenBucket(rate, maxTokens float64) *TokenBucket {
if rate <= 0 || maxTokens <= 0 {
log.Fatal("Invalid rate or max tokens")
return nil
}
return &TokenBucket{
rate: rate,
maxTokens: maxTokens,
tokens: maxTokens,
lastRefill: time.Now(),
}
}
func (t *TokenBucket) Allow() bool {
t.mx.Lock()
defer t.mx.Unlock()
now := time.Now()
elapsed := now.Sub(t.lastRefill).Seconds()
t.tokens = t.tokens + t.rate*elapsed
t.lastRefill = now
if t.tokens > t.maxTokens {
t.tokens = t.maxTokens
}
if t.tokens >= 1.0 {
t.tokens = t.tokens - 1.0
return true
}
return false
}
Please note Golang has a builtin ratelimiter package, which can be used in prod: golang.org/x/time/rate
If you want to use the official package, implementation becomes much simpler:
package main
import (
"fmt"
"time"
"golang.org/x/time/rate"
)
func main() {
rateLimiter := rate.NewLimiter(5, 10) // 5 tokens/sec, up to 10 burst
for i := 1; i <= 15; i++ {
if rateLimiter.Allow() {
fmt.Println("request", i, "at", time.Now().Format("15:04:05.000"))
} else {
fmt.Println("[rate-limited]: request", i, "at", time.Now().Format("15:04:05.000"))
time.Sleep(time.Second)
}
}
}
Let's switch to Rust version now. It also looks almost similar to golang version. But you still need to wire it up with the middleware and router.
use std::sync::Mutex;
use std::time::Instant;
pub struct TokenBucket {
inner: Mutex<Inner>,
max_tokens: f64,
rate: f64, // tokens per second
}
struct Inner {
tokens: f64,
last_refill: Instant,
}
impl TokenBucket {
pub fn new(rate: f64, max_tokens: f64) -> Self {
assert!(rate > 0.0 && max_tokens > 0.0);
Self {
inner: Mutex::new(Inner {
tokens: max_tokens,
last_refill: Instant::now(),
}),
max_tokens,
rate,
}
}
pub fn allow(&self) -> bool {
let mut inner = self.inner.lock().unwrap();
let now = Instant::now();
let elapsed_secs = now.duration_since(inner.last_refill).as_secs_f64();
inner.tokens = (inner.tokens + self.rate * elapsed_secs).min(self.max_tokens);
inner.last_refill = now;
if inner.tokens >= 1.0 {
inner.tokens -= 1.0;
true
} else {
false
}
}
}