Skip to content

Rate Limits

Rate limits control how many requests you can make per minute.

Rate limits in our queue-and-poll API are simple: account-wide, per-minute resets, with tiers from Free to Enterprise. Dive into the details, retry strategies, and tips to scale without interruptions—upgrade if you need more throughput.

How Rate Limits Work

Rate limits are enforced at the account level, not per API key or per model.

This means:

  • All your API keys share the same rate limit
  • All models count toward the same limit
  • Limits reset every minute

Example:

If you have a Starter tier account (10 requests/minute):

  • You can make 10 requests using API key A
  • Or 5 requests with API key A and 5 with API key B
  • Or 10 requests to gpt-5.1 and 0 to other models
  • Or any combination that totals 10 requests/minute

Rate Limit Tiers

Tier Requests per Minute Daily Limit (Free only) Monthly Limit (Free only)
Free 2 RPM 10 requests/day 300 requests/month
Starter 10 RPM Unlimited Unlimited
Scale 30 RPM Unlimited Unlimited
Enterprise 100 RPM Unlimited Unlimited

Free tier has additional daily and monthly limits. All paid tiers have no daily or monthly caps—only per-minute limits.


What Happens When You Hit the Limit

When you exceed your rate limit, the API returns a 429 Too Many Requests error with a specific message indicating which limit was exceeded:

RPM Limit Exceeded
{
    "detail": "rpm_limit_exceeded, upgrade your plan for more requests!!!"
}
Daily Limit Exceeded (Free Tier Only)
{
    "detail": "daily_limit_exceeded, upgrade your plan for more requests!!!"
}
Monthly Limit Exceeded (Free Tier Only)
{
    "detail": "monthly_limit_exceeded, upgrade your plan for more requests!!!"
}

Your request is not queued—it's rejected immediately. For RPM limits, wait 60 seconds before retrying. For daily/monthly limits, upgrade your plan or wait until the next day/month.


Handling Rate Limits in Code

Strategy 1: Wait and Retry

Python
import requests
import time

def make_request_with_retry(api_key, model, prompt, max_retries=3):
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }

    for attempt in range(max_retries):
        try:
            response = requests.post(
                "https://app.beginswithai.com/v1/ai",
                json={"model": model, "prompt": prompt},
                headers=headers
            )

            if response.status_code == 429:
                # RPM limits reset every 60 seconds
                print(f"Rate limited. Waiting 60s...")
                time.sleep(60)
                continue

            response.raise_for_status()
            return response.json()

        except requests.exceptions.RequestException as e:
            if attempt == max_retries - 1:
                raise
            time.sleep(5)

    raise Exception("Max retries exceeded")
JavaScript
async function makeRequestWithRetry(apiKey, model, prompt, maxRetries = 3) {
    const headers = {
        "Authorization": `Bearer ${apiKey}`,
        "Content-Type": "application/json"
    };

    for (let attempt = 0; attempt < maxRetries; attempt++) {
        try {
            const response = await fetch("https://app.beginswithai.com/v1/ai", {
                method: "POST",
                headers: headers,
                body: JSON.stringify({ model, prompt })
            });

            if (response.status === 429) {
                // RPM limits reset every 60 seconds
                console.log(`Rate limited. Waiting 60s...`);
                await new Promise(r => setTimeout(r, 60000));
                continue;
            }

            if (!response.ok) {
                throw new Error(`HTTP ${response.status}`);
            }

            return await response.json();
        } catch (error) {
            if (attempt === maxRetries - 1) {
                throw error;
            }
            await new Promise(r => setTimeout(r, 5000));
        }
    }

    throw new Error("Max retries exceeded");
}

Strategy 2: Rate Limit Your Own Code

Proactively limit your requests to stay under the threshold:

Python
import time
from collections import deque

class RateLimiter:
    def __init__(self, max_requests, time_window=60):
        self.max_requests = max_requests
        self.time_window = time_window
        self.requests = deque()

    def wait_if_needed(self):
        now = time.time()

        # Remove requests outside the time window
        while self.requests and self.requests[0] < now - self.time_window:
            self.requests.popleft()

        # If at limit, wait
        if len(self.requests) >= self.max_requests:
            wait_time = self.time_window - (now - self.requests[0])
            if wait_time > 0:
                print(f"Rate limit reached. Waiting {wait_time:.1f}s...")
                time.sleep(wait_time)
                self.requests.clear()

        self.requests.append(time.time())

# Usage
limiter = RateLimiter(max_requests=10)  # For Starter tier

for i in range(100):
    limiter.wait_if_needed()
    # Make your API request here
    print(f"Request {i+1}")
JavaScript
class RateLimiter {
    constructor(maxRequests, timeWindow = 60000) {
        this.maxRequests = maxRequests;
        this.timeWindow = timeWindow;
        this.requests = [];
    }

    async waitIfNeeded() {
        const now = Date.now();

        // Remove requests outside time window
        this.requests = this.requests.filter(
            time => time > now - this.timeWindow
        );

        // If at limit, wait
        if (this.requests.length >= this.maxRequests) {
            const waitTime = this.timeWindow - (now - this.requests[0]);
            if (waitTime > 0) {
                console.log(`Rate limit reached. Waiting ${(waitTime/1000).toFixed(1)}s...`);
                await new Promise(r => setTimeout(r, waitTime));
                this.requests = [];
            }
        }

        this.requests.push(Date.now());
    }
}

// Usage
const limiter = new RateLimiter(10);  // For Starter tier (10 RPM)

for (let i = 0; i < 100; i++) {
    await limiter.waitIfNeeded();
    // Make your API request here
    console.log(`Request ${i+1}`);
}

1. Batch Your Requests

Instead of making requests all at once, spread them out:

# Bad: All at once (will hit rate limit quickly)
for prompt in prompts:
    make_request(prompt)

# Good: Controlled batching
for i, prompt in enumerate(prompts):
    if i > 0 and i % 9 == 0:  # Every 9 requests
        time.sleep(60)  # Wait 1 minute
    make_request(prompt)
// Bad: All at once (will hit rate limit quickly)
for (const prompt of prompts) {
    await makeRequest(prompt);
}

// Good: Controlled batching
for (let i = 0; i < prompts.length; i++) {
    if (i > 0 && i % 9 === 0) {  // Every 9 requests
        await new Promise(r => setTimeout(r, 60000));  // Wait 1 minute
    }
    await makeRequest(prompts[i]);
}

2. Use Exponential Backoff

When you hit a rate limit, exponentially increase wait time:

import time

def exponential_backoff_retry(func, max_attempts=5):
    for attempt in range(max_attempts):
        try:
            return func()
        except RateLimitError:
            if attempt == max_attempts - 1:
                raise
            wait_time = 2 ** attempt  # 1s, 2s, 4s, 8s, 16s
            print(f"Retrying in {wait_time}s...")
            time.sleep(wait_time)
async function exponentialBackoffRetry(func, maxAttempts = 5) {
    for (let attempt = 0; attempt < maxAttempts; attempt++) {
        try {
            return await func();
        } catch (error) {
            if (error instanceof RateLimitError) {
                if (attempt === maxAttempts - 1) {
                    throw error;
                }
                const waitTime = Math.pow(2, attempt) * 1000;  // 1s, 2s, 4s, 8s, 16s
                console.log(`Retrying in ${waitTime/1000}s...`);
                await new Promise(r => setTimeout(r, waitTime));
            } else {
                throw error;
            }
        }
    }
}

3. Monitor Your Usage

Track requests in real-time to avoid hitting limits:

import time

class RequestTracker:
    def __init__(self):
        self.request_count = 0
        self.window_start = time.time()

    def log_request(self):
        now = time.time()

        # Reset counter every minute
        if now - self.window_start > 60:
            self.request_count = 0
            self.window_start = now

        self.request_count += 1
        print(f"Request {self.request_count}/10 this minute")

tracker = RequestTracker()

for i in range(100):
    tracker.log_request()
    # Make API request
class RequestTracker {
    constructor() {
        this.requestCount = 0;
        this.windowStart = Date.now();
    }

    logRequest() {
        const now = Date.now();

        // Reset counter every minute
        if (now - this.windowStart > 60000) {
            this.requestCount = 0;
            this.windowStart = now;
        }

        this.requestCount++;
        console.log(`Request ${this.requestCount}/10 this minute`);
    }
}

const tracker = new RequestTracker();

for (let i = 0; i < 100; i++) {
    tracker.logRequest();
    // Make API request
}

Upgrading Your Tier

If you consistently hit rate limits, consider upgrading:

From To Improvement
Free (2 RPM) Starter (10 RPM) 5x increase
Starter (10 RPM) Scale (30 RPM) 3x increase
Scale (30 RPM) Enterprise (100 RPM) 3.3x increase

Upgrade at app.beginswithai.com/subscription