Rate Limits
Rate limits control how many requests you can make per minute.
Rate limits in our queue-and-poll API are simple: account-wide, per-minute resets, with tiers from Free to Enterprise. Dive into the details, retry strategies, and tips to scale without interruptions—upgrade if you need more throughput.
How Rate Limits Work
Rate limits are enforced at the account level, not per API key or per model.
This means:
- All your API keys share the same rate limit
- All models count toward the same limit
- Limits reset every minute
Example:
If you have a Starter tier account (10 requests/minute):
- You can make 10 requests using API key A
- Or 5 requests with API key A and 5 with API key B
- Or 10 requests to
gpt-5.1and 0 to other models - Or any combination that totals 10 requests/minute
Rate Limit Tiers
| Tier | Requests per Minute | Daily Limit (Free only) | Monthly Limit (Free only) |
|---|---|---|---|
| Free | 2 RPM | 10 requests/day | 300 requests/month |
| Starter | 10 RPM | Unlimited | Unlimited |
| Scale | 30 RPM | Unlimited | Unlimited |
| Enterprise | 100 RPM | Unlimited | Unlimited |
Free tier has additional daily and monthly limits. All paid tiers have no daily or monthly caps—only per-minute limits.
What Happens When You Hit the Limit
When you exceed your rate limit, the API returns a 429 Too Many Requests error with a specific message indicating which limit was exceeded:
{
"detail": "daily_limit_exceeded, upgrade your plan for more requests!!!"
}
{
"detail": "monthly_limit_exceeded, upgrade your plan for more requests!!!"
}
Your request is not queued—it's rejected immediately. For RPM limits, wait 60 seconds before retrying. For daily/monthly limits, upgrade your plan or wait until the next day/month.
Handling Rate Limits in Code
Strategy 1: Wait and Retry
import requests
import time
def make_request_with_retry(api_key, model, prompt, max_retries=3):
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
for attempt in range(max_retries):
try:
response = requests.post(
"https://app.beginswithai.com/v1/ai",
json={"model": model, "prompt": prompt},
headers=headers
)
if response.status_code == 429:
# RPM limits reset every 60 seconds
print(f"Rate limited. Waiting 60s...")
time.sleep(60)
continue
response.raise_for_status()
return response.json()
except requests.exceptions.RequestException as e:
if attempt == max_retries - 1:
raise
time.sleep(5)
raise Exception("Max retries exceeded")
async function makeRequestWithRetry(apiKey, model, prompt, maxRetries = 3) {
const headers = {
"Authorization": `Bearer ${apiKey}`,
"Content-Type": "application/json"
};
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
const response = await fetch("https://app.beginswithai.com/v1/ai", {
method: "POST",
headers: headers,
body: JSON.stringify({ model, prompt })
});
if (response.status === 429) {
// RPM limits reset every 60 seconds
console.log(`Rate limited. Waiting 60s...`);
await new Promise(r => setTimeout(r, 60000));
continue;
}
if (!response.ok) {
throw new Error(`HTTP ${response.status}`);
}
return await response.json();
} catch (error) {
if (attempt === maxRetries - 1) {
throw error;
}
await new Promise(r => setTimeout(r, 5000));
}
}
throw new Error("Max retries exceeded");
}
Strategy 2: Rate Limit Your Own Code
Proactively limit your requests to stay under the threshold:
import time
from collections import deque
class RateLimiter:
def __init__(self, max_requests, time_window=60):
self.max_requests = max_requests
self.time_window = time_window
self.requests = deque()
def wait_if_needed(self):
now = time.time()
# Remove requests outside the time window
while self.requests and self.requests[0] < now - self.time_window:
self.requests.popleft()
# If at limit, wait
if len(self.requests) >= self.max_requests:
wait_time = self.time_window - (now - self.requests[0])
if wait_time > 0:
print(f"Rate limit reached. Waiting {wait_time:.1f}s...")
time.sleep(wait_time)
self.requests.clear()
self.requests.append(time.time())
# Usage
limiter = RateLimiter(max_requests=10) # For Starter tier
for i in range(100):
limiter.wait_if_needed()
# Make your API request here
print(f"Request {i+1}")
class RateLimiter {
constructor(maxRequests, timeWindow = 60000) {
this.maxRequests = maxRequests;
this.timeWindow = timeWindow;
this.requests = [];
}
async waitIfNeeded() {
const now = Date.now();
// Remove requests outside time window
this.requests = this.requests.filter(
time => time > now - this.timeWindow
);
// If at limit, wait
if (this.requests.length >= this.maxRequests) {
const waitTime = this.timeWindow - (now - this.requests[0]);
if (waitTime > 0) {
console.log(`Rate limit reached. Waiting ${(waitTime/1000).toFixed(1)}s...`);
await new Promise(r => setTimeout(r, waitTime));
this.requests = [];
}
}
this.requests.push(Date.now());
}
}
// Usage
const limiter = new RateLimiter(10); // For Starter tier (10 RPM)
for (let i = 0; i < 100; i++) {
await limiter.waitIfNeeded();
// Make your API request here
console.log(`Request ${i+1}`);
}
1. Batch Your Requests
Instead of making requests all at once, spread them out:
// Bad: All at once (will hit rate limit quickly)
for (const prompt of prompts) {
await makeRequest(prompt);
}
// Good: Controlled batching
for (let i = 0; i < prompts.length; i++) {
if (i > 0 && i % 9 === 0) { // Every 9 requests
await new Promise(r => setTimeout(r, 60000)); // Wait 1 minute
}
await makeRequest(prompts[i]);
}
2. Use Exponential Backoff
When you hit a rate limit, exponentially increase wait time:
async function exponentialBackoffRetry(func, maxAttempts = 5) {
for (let attempt = 0; attempt < maxAttempts; attempt++) {
try {
return await func();
} catch (error) {
if (error instanceof RateLimitError) {
if (attempt === maxAttempts - 1) {
throw error;
}
const waitTime = Math.pow(2, attempt) * 1000; // 1s, 2s, 4s, 8s, 16s
console.log(`Retrying in ${waitTime/1000}s...`);
await new Promise(r => setTimeout(r, waitTime));
} else {
throw error;
}
}
}
}
3. Monitor Your Usage
Track requests in real-time to avoid hitting limits:
import time
class RequestTracker:
def __init__(self):
self.request_count = 0
self.window_start = time.time()
def log_request(self):
now = time.time()
# Reset counter every minute
if now - self.window_start > 60:
self.request_count = 0
self.window_start = now
self.request_count += 1
print(f"Request {self.request_count}/10 this minute")
tracker = RequestTracker()
for i in range(100):
tracker.log_request()
# Make API request
class RequestTracker {
constructor() {
this.requestCount = 0;
this.windowStart = Date.now();
}
logRequest() {
const now = Date.now();
// Reset counter every minute
if (now - this.windowStart > 60000) {
this.requestCount = 0;
this.windowStart = now;
}
this.requestCount++;
console.log(`Request ${this.requestCount}/10 this minute`);
}
}
const tracker = new RequestTracker();
for (let i = 0; i < 100; i++) {
tracker.logRequest();
// Make API request
}
Upgrading Your Tier
If you consistently hit rate limits, consider upgrading:
| From | To | Improvement |
|---|---|---|
| Free (2 RPM) | Starter (10 RPM) | 5x increase |
| Starter (10 RPM) | Scale (30 RPM) | 3x increase |
| Scale (30 RPM) | Enterprise (100 RPM) | 3.3x increase |
Upgrade at app.beginswithai.com/subscription