Rate Limiting

The rate limiter controls how fast ralph-starter's loop can call AI agents. It prevents you from accidentally burning through API quotas or running up costs by enforcing both per-minute and per-hour call limits using a sliding window algorithm.

Default Limits

The rate limiter ships with these default thresholds:

Setting	Default Value	Description
`maxCallsPerHour`	100	Maximum API calls allowed in any rolling 60-minute window
`maxCallsPerMinute`	10	Maximum API calls allowed in any rolling 60-second window
`warningThreshold`	0.8 (80%)	Percentage of either limit that triggers a warning

With defaults, the agent can make at most 10 calls in any given minute and 100 calls in any given hour. These limits apply independently -- hitting the minute limit blocks calls even if the hour limit has room, and vice versa.

CLI Configuration

Set a Custom Hourly Limit

Use the --rate-limit flag to set the maximum calls per hour:

# Allow only 30 API calls per hour
ralph-starter run "add feature" --rate-limit 30

This overrides the default of 100 calls per hour. The per-minute limit remains at 10 unless changed programmatically.

Combine with Other Flags

Rate limiting works alongside all other loop options:

ralph-starter run "implement auth" \
  --preset feature \
  --rate-limit 50 \
  --max-iterations 30 \
  --validate \
  --commit

Sliding Window Behavior

The rate limiter uses a sliding window algorithm rather than fixed time buckets. This means:

It tracks the exact timestamp of every API call.
At any moment, it counts how many calls occurred in the last 60 seconds (minute window) and the last 3,600 seconds (hour window).
Old timestamps outside the 1-hour window are automatically cleaned up.

Why Sliding Windows Matter

With fixed buckets (e.g., resetting at the top of each hour), you could make 100 calls at 12:59 and 100 more at 1:01 -- 200 calls in 2 minutes. Sliding windows prevent this by always looking backward from the current moment.

Example timeline with 10 calls/minute limit:

Time    Calls in last 60s    Allowed?
00   0                    Yes (call recorded)
05   1                    Yes (call recorded)
10   2                    Yes (call recorded)
...
50   10                   No (minute limit hit)
55   10                   No (still 10 in window)
01   9                    Yes (12:00 call expired from window)

Warning vs. Blocking

The rate limiter has two escalation levels:

Warning State

When usage reaches 80% (the warningThreshold) of either limit, the rate limiter enters a warning state. During the warning state:

The loop continues to execute.
A warning message is displayed: Warning: Approaching rate limit.
The current usage percentages are shown.

For example, with default settings:

Minute warning: Triggers at 8 out of 10 calls in the last minute.
Hour warning: Triggers at 80 out of 100 calls in the last hour.

Blocked State

When either limit is fully reached, the rate limiter blocks further calls:

New calls are denied until capacity frees up.
The display shows: Blocked - retry in Ns with a countdown.
The loop waits automatically using the waitAndAcquire method.

The wait time is calculated precisely: it finds the oldest call in the saturated window and computes how long until that call expires from the window (plus a 100ms buffer).

Automatic Wait and Retry

When the rate limiter blocks a call, the loop does not immediately fail. Instead, it uses waitAndAcquire to pause and retry:

The loop checks if a call is allowed (tryAcquire).
If blocked, it calculates the wait time until the next slot opens.
It sleeps for that duration (capped at 5-second polling intervals).
It retries the acquisition.
If still blocked after 5 minutes (default timeout), the acquisition fails and the loop exits with rate_limit as the exit reason.

This means the loop gracefully slows down rather than crashing when limits are hit.

Reading Rate Limiter Stats

During execution, rate limiter statistics are displayed in a compact format:

Minute: 7/10 (70%) | Hour: 45/100 (45%)

When approaching limits:

Minute: 9/10 (90%) | Hour: 82/100 (82%) | Warning: Approaching rate limit

When blocked:

Minute: 10/10 (100%) | Hour: 85/100 (85%) | Blocked - retry in 12s

Practical Examples

Conservative Rate for Expensive Models

When using expensive models like GPT-4 or Claude 3 Opus, limit the rate to control costs:

# Only 20 calls per hour with an expensive model
ralph-starter run "implement feature" \
  --preset feature \
  --rate-limit 20 \
  --model claude-3-opus

At Claude 3 Opus pricing, 20 iterations might cost $1-5 depending on context size. See the Cost Tracking guide for details.

High-Throughput Batch Processing

For batch processing with a cheaper model, you might want a higher limit:

ralph-starter run "process all pending issues" \
  --rate-limit 200 \
  --model claude-3-haiku

Combined with Circuit Breaker

Rate limiting and circuit breaking complement each other. Rate limiting controls speed; circuit breaking controls failure tolerance.

ralph-starter run "add feature" \
  --rate-limit 50 \
  --circuit-breaker-failures 3 \
  --circuit-breaker-errors 5

If the agent hits errors, the circuit breaker stops the loop before the rate limiter even becomes relevant. If the agent is succeeding but making many calls, the rate limiter slows it down.

Loop Exit Behavior

When the rate limiter blocks a call and the wait timeout (5 minutes) expires without a slot opening, the loop exits with:

{
  success: false,
  exitReason: 'rate_limit',
  error: 'Rate limit exceeded'
}

This is distinct from other exit reasons like max_iterations, circuit_breaker, or completed. You can check the exitReason field to determine why the loop stopped.

Architecture Notes

The RateLimiter class maintains an in-memory array of call timestamps. Key implementation details:

Cleanup: Old timestamps (older than 1 hour) are pruned on every operation to prevent memory growth.
Thread safety: The rate limiter is designed for single-threaded Node.js execution. It does not use locks.
Reset: Calling reset() clears all timestamps, immediately restoring full capacity.
Config updates: You can update limits mid-run with updateConfig(), though this is not exposed via CLI.

Default Limits​

CLI Configuration​

Set a Custom Hourly Limit​

Combine with Other Flags​

Sliding Window Behavior​

Why Sliding Windows Matter​

Warning vs. Blocking​

Warning State​

Blocked State​

Automatic Wait and Retry​

Reading Rate Limiter Stats​

Practical Examples​

Conservative Rate for Expensive Models​

High-Throughput Batch Processing​

Combined with Circuit Breaker​

Loop Exit Behavior​

Architecture Notes​