Rate Limiter - Token Bucket

A rate limiter is a reliability pattern used in high-traffic systems. Its purpose is to control incoming request volume so your servers don’t get overwhelmed (or abused) under load.

There are several common rate-limiting strategies—such as token bucket, leaky bucket, and sliding window. In production, token bucket is one of the most popular choices because it’s simple, efficient, and allows short bursts while still enforcing an average rate.

Here’s the basic idea:

Clients can typically retry later (often after waiting briefly), and rate limiters can be deployed in multiple scopes—for example per tenant, per API key, per service, or per endpoint—depending on what you’re trying to protect.

Since this blog is built with Rust, I decided to implement a simple rate limiter for this blog. Just for fun. We’ll walk through simple implementations in both Go and Rust.

Conceptual High level PseudoCode

  1. When the request comes, take the current timestamp
  2. Compare how much time has been elapsed since last request's timestamp
  3. Compute the time elapsed (delta_time)
  4. Compute the token to be filled: delta_time * desired_rate_of_request
  5. Add the new tokens to the bucket

Token Bucket RateLimiter Implementation in Golang:

package main

import (
	"log"
	"sync"
	"time"
)
type TokenBucket struct {
	mx         sync.Mutex
	maxTokens  float64
	tokens     float64
	rate       float64
	lastRefill time.Time
}

func NewTokenBucket(rate, maxTokens float64) *TokenBucket {
	if rate <= 0 || maxTokens <= 0 {
		log.Fatal("Invalid rate or max tokens")
		return nil
	}
	return &TokenBucket{
		rate:       rate,
		maxTokens:  maxTokens,
		tokens:     maxTokens,
		lastRefill: time.Now(),
	}
}

func (t *TokenBucket) Allow() bool {
	t.mx.Lock()
	defer t.mx.Unlock()

	now := time.Now()
	elapsed := now.Sub(t.lastRefill).Seconds()
	t.tokens = t.tokens + t.rate*elapsed
	t.lastRefill = now

	if t.tokens > t.maxTokens {
		t.tokens = t.maxTokens
	}

	if t.tokens >= 1.0 {
		t.tokens = t.tokens - 1.0
		return true
	}
	return false
}

Please note Golang has a builtin ratelimiter package, which can be used in prod: golang.org/x/time/rate If you want to use the official package, implementation becomes much simpler:

package main

import (
	"fmt"
	"time"

	"golang.org/x/time/rate"
)

func main() {
	rateLimiter := rate.NewLimiter(5, 10) // 5 tokens/sec, up to 10 burst
	for i := 1; i <= 15; i++ {
		if rateLimiter.Allow() {
			fmt.Println("request", i, "at", time.Now().Format("15:04:05.000"))
		} else {
			fmt.Println("[rate-limited]: request", i, "at", time.Now().Format("15:04:05.000"))
			time.Sleep(time.Second)
		}
	}
}

Let's switch to Rust version now. It also looks almost similar to golang version. But you still need to wire it up with the middleware and router.


use std::sync::Mutex;
use std::time::Instant;

pub struct TokenBucket {
    inner: Mutex<Inner>,
    max_tokens: f64,
    rate: f64, // tokens per second
}

struct Inner {
    tokens: f64,
    last_refill: Instant,
}

impl TokenBucket {
    pub fn new(rate: f64, max_tokens: f64) -> Self {
        assert!(rate > 0.0 && max_tokens > 0.0);
        Self {
            inner: Mutex::new(Inner {
                tokens: max_tokens,
                last_refill: Instant::now(),
            }),
            max_tokens,
            rate,
        }
    }

    pub fn allow(&self) -> bool {
        let mut inner = self.inner.lock().unwrap();

        let now = Instant::now();
        let elapsed_secs = now.duration_since(inner.last_refill).as_secs_f64();

        inner.tokens = (inner.tokens + self.rate * elapsed_secs).min(self.max_tokens);
        inner.last_refill = now;

        if inner.tokens >= 1.0 {
            inner.tokens -= 1.0;
            true
        } else {
            false
        }
    }
}