Rate Limits

Understanding API Rate Limits

Rate limits restrict how many API requests you can make within a given time window. These limits prevent abuse, ensure fair resource allocation across all customers, protect API infrastructure from overload, and encourage efficient API usage patterns.

Understanding Rate Limits

Without rate limits, a misbehaving integration could overwhelm the API with excessive requests, degrading performance for all users. Rate limits create a fair, stable environment where everyone has reliable API access.

Rate Limit Tiers

Different account tiers have different rate limits:

Free Tier

• 100 requests per minute
• 10,000 requests per day

Suitable for development, testing, and small-scale applications

Professional Tier

• 500 requests per minute
• 50,000 requests per day

Suitable for most production applications and integrations

Business Tier

• 1,000 requests per minute
• 100,000 requests per day

Suitable for high-volume integrations and multiple concurrent applications

Enterprise Tier

• Custom limits negotiated based on needs
• Dedicated rate limit pools
• Priority API access

Suitable for large-scale operations with high API demand

Rate limits apply per API key. If you have multiple keys, each has independent rate limits allowing parallel usage without conflicts.

Rate Limit Headers

Every API response includes headers showing your current rate limit status:

X-RateLimit-Limit: 500
X-RateLimit-Remaining: 487
X-RateLimit-Reset: 1705320000

Header Meanings:

• X-RateLimit-Limit - Maximum requests allowed in current window (500 per minute for this key)
• X-RateLimit-Remaining - Requests remaining in current window (487 requests left)
• X-RateLimit-Reset - Unix timestamp when the rate limit window resets (January 15, 2025 at 10:40:00 UTC)

Check these headers after each request to understand your current rate limit status. When remaining count is low, slow down requests or wait until reset.

Example Header Checking in JavaScript:

const response = await fetch('https://api.ayra.ai/v1/agents', {
  headers: { 'Authorization': `Bearer ${apiKey}` }
});

const remaining = response.headers.get('X-RateLimit-Remaining');
const reset = response.headers.get('X-RateLimit-Reset');

if (remaining < 10) {
  console.warn(`Only ${remaining} requests remaining until ${new Date(reset * 1000)}`);
}

Rate Limit Exceeded Response

When you exceed rate limits, the API returns HTTP 429 (Too Many Requests) status:

{
  "error": {
    "type": "rate_limit_error",
    "message": "Rate limit exceeded",
    "code": "too_many_requests",
    "limit": 500,
    "reset_at": "2025-01-15T10:40:00Z",
    "retry_after": 15
  }
}

The response includes:

• limit - Your rate limit (500 requests per minute)
• reset_at - When the limit resets (ISO 8601 timestamp)
• retry_after - Seconds to wait before retrying (15 seconds)

The Retry-After header also provides this timing information:

Retry-After: 15

Handling Rate Limits

Implementing Retry Logic:

When you receive 429 responses, implement retry logic with exponential backoff:

async function apiRequestWithRetry(url, options, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    const response = await fetch(url, options);
    
    if (response.status === 429) {
      const retryAfter = response.headers.get('Retry-After');
      const waitTime = retryAfter ? parseInt(retryAfter) * 1000 : Math.pow(2, i) * 1000;
      
      console.log(`Rate limited. Waiting ${waitTime}ms before retry ${i + 1}/${maxRetries}`);
      await new Promise(resolve => setTimeout(resolve, waitTime));
      continue;
    }
    
    return response;
  }
  
  throw new Error('Max retries exceeded');
}

Proactive Rate Limit Management:

Rather than hitting limits and retrying, proactively manage request rate:

class RateLimiter {
  constructor(maxRequests, windowMs) {
    this.maxRequests = maxRequests;
    this.windowMs = windowMs;
    this.requests = [];
  }
  
  async throttle() {
    const now = Date.now();
    this.requests = this.requests.filter(time => now - time < this.windowMs);
    
    if (this.requests.length >= this.maxRequests) {
      const oldestRequest = Math.min(...this.requests);
      const waitTime = this.windowMs - (now - oldestRequest);
      await new Promise(resolve => setTimeout(resolve, waitTime));
      return this.throttle();
    }
    
    this.requests.push(now);
  }
}

// Usage
const limiter = new RateLimiter(500, 60000); // 500 requests per minute

async function makeRequest(url, options) {
  await limiter.throttle();
  return fetch(url, options);
}

Optimizing API Usage

Reduce Unnecessary Requests:

• Cache data that doesn't change frequently
• Batch operations when possible
• Use webhooks instead of polling
• Filter and paginate server-side rather than client-side
• Request only needed fields in responses

Example - Caching Agent Configurations:

const cache = new Map();
const CACHE_TTL = 300000; // 5 minutes

async function getAgent(agentId) {
  const cacheKey = `agent:${agentId}`;
  const cached = cache.get(cacheKey);
  
  if (cached && Date.now() - cached.timestamp < CACHE_TTL) {
    return cached.data;
  }
  
  const response = await fetch(`https://api.ayra.ai/v1/agents/${agentId}`, {
    headers: { 'Authorization': `Bearer ${apiKey}` }
  });
  
  const data = await response.json();
  cache.set(cacheKey, { data, timestamp: Date.now() });
  
  return data;
}

Example - Batching Requests:

Instead of creating contacts one-by-one:

// Inefficient - 100 requests
for (const contact of contacts) {
  await createContact(contact); // 100 API calls
}

Use batch endpoint if available:

// Efficient - 1 request
await createContactsBatch(contacts); // 1 API call

Example - Webhooks vs Polling:

Instead of polling for conversation updates:

// Inefficient polling - many unnecessary requests
setInterval(async () => {
  const conversations = await getRecentConversations();
  checkForNewConversations(conversations);
}, 30000); // Every 30 seconds = 120 requests/hour

Use webhooks:

// Efficient webhooks - zero polling requests
app.post('/webhooks/conversations', (req, res) => {
  const conversation = req.body;
  processNewConversation(conversation);
  res.status(200).send('OK');
});

Monitoring Rate Limit Usage

Track rate limit consumption to avoid unexpected limits:

Log Rate Limit Headers:

function logRateLimitStatus(response) {
  const limit = response.headers.get('X-RateLimit-Limit');
  const remaining = response.headers.get('X-RateLimit-Remaining');
  const resetTime = new Date(response.headers.get('X-RateLimit-Reset') * 1000);
  
  const usagePercent = ((limit - remaining) / limit * 100).toFixed(2);
  
  console.log(`Rate Limit: ${limit - remaining}/${limit} used (${usagePercent}%) - Resets at ${resetTime}`);
  
  if (remaining < limit * 0.1) { // Less than 10% remaining
    console.warn('WARNING: Rate limit nearly exhausted!');
  }
}

Set Up Alerts:

Configure alerts when rate limit usage exceeds thresholds:

• Alert when 70% of rate limit consumed
• Alert when 90% of rate limit consumed
• Alert when rate limit exceeded

Alerts enable proactive response before hitting hard limits.

Review Usage Patterns:

Analyze when rate limits are consumed:

• What time of day sees highest usage?
• Which operations consume the most quota?
• Are there usage spikes indicating inefficiency?

This analysis informs optimization efforts and capacity planning.

Requesting Higher Rate Limits

If legitimate usage requires higher limits than your tier provides, request increases:

Before Requesting:

1. Optimize current usage reducing unnecessary requests
2. Document your use case and request volumes
3. Demonstrate current limits are constraining legitimate operations
4. Consider upgrading account tier if appropriate

Request Process:

1. Contact Ayra support through dashboard
2. Provide account ID and current tier
3. Explain use case requiring higher limits
4. Share current and projected request volumes
5. Describe optimization efforts already taken

Information to Include:

• Current rate limit tier
• Current daily/monthly request volumes
• Projected future volumes
• Use case description
• Business justification
• Timeline when higher limits are needed

Ayra reviews requests considering the legitimacy of use case, optimization efforts, account standing, and infrastructure capacity. Approved requests receive increased limits typically within 1-2 business days.

Rate Limiting Best Practices

Respect Rate Limits

Don't attempt to bypass or circumvent rate limits. Using multiple API keys to evade limits violates terms of service and risks account suspension.

Implement Backoff

Always implement exponential backoff when retrying failed requests. Immediate retries after 429 responses worsen the situation.

Monitor Usage

Track rate limit consumption proactively rather than reactively after hitting limits.

Cache Appropriately

Cache data that doesn't change frequently, reducing API calls.

Use Webhooks

Replace polling with webhooks for real-time data, eliminating most polling requests.

Ready to transform your agency?

Start building with Ayra today. No credit card required.