Skip to main content

Cost Tracking

Running autonomous AI loops consumes tokens, and tokens cost money. Ralph-starter includes a built-in cost tracker that estimates token usage and calculates costs per iteration, giving you visibility into how much each loop run is spending.

Enabling Cost Tracking

Cost tracking is enabled by default. You can explicitly control it with CLI flags:

# Enabled by default -- these are equivalent:
ralph-starter run "add user dashboard" --preset feature
ralph-starter run "add user dashboard" --preset feature --track-cost

# Disable cost tracking:
ralph-starter run "add user dashboard" --preset feature --no-track-cost

Token Estimation

Ralph-starter estimates token counts from the raw text of each agent interaction rather than relying on the provider's token counter. The estimation uses different ratios depending on content type:

Content TypeCharacters per TokenExample
Prose (English text, docs, explanations)~4 chars/tokenA 2,000-character paragraph is ~500 tokens
Code (source files, config, commands)~3.5 chars/tokenA 2,000-character code block is ~571 tokens

The tracker detects code content by checking for common patterns like ```, function, const, import, export, class, def, async, and await. If any of these patterns are found, the lower 3.5 chars/token ratio is applied.

These are approximations. Actual token counts may vary by 10-20% depending on the model's tokenizer, language, and content structure.

Model Pricing

The cost tracker uses the following pricing table to convert token estimates into dollar costs. Prices are per 1 million tokens.

ModelInput (per 1M tokens)Output (per 1M tokens)
Claude 3 Opus$15.00$75.00
Claude 3.5 Sonnet$3.00$15.00
Claude 3.5 Haiku$0.25$1.25
GPT-4$30.00$60.00
GPT-4 Turbo$10.00$30.00
Default (unknown models)$3.00$15.00

If the model you are using is not in the table, the tracker falls back to the Default pricing (Claude 3.5 Sonnet-equivalent rates), which provides a conservative middle-ground estimate.

You can specify the model for more accurate pricing:

ralph-starter run "add feature" --preset feature --model claude-3-opus

Per-Iteration Cost Breakdown

When cost tracking is enabled and progress tracking is active, ralph-starter writes a cost summary to the activity.md file in your project directory. This summary is updated after each iteration.

The summary looks like this:

## Cost Summary

| Metric | Value |
|--------|-------|
| Total Iterations | 12 |
| Total Tokens | 45.2K |
| Input Tokens | 38.1K |
| Output Tokens | 7.1K |
| Total Cost | $0.221 |
| Avg Cost/Iteration | $0.018 |
| Projected Max Cost | $0.552 |

Understanding the Metrics

MetricDescription
Total IterationsNumber of agent loop iterations completed so far
Total TokensCombined input and output tokens across all iterations
Input TokensTokens sent to the model (prompts, context, code)
Output TokensTokens generated by the model (responses, code output)
Total CostEstimated cumulative cost in USD
Avg Cost/IterationTotal cost divided by number of iterations
Projected Max CostExtrapolated cost if the loop runs to its maxIterations limit

The Projected Max Cost is only shown after 3 or more iterations, since earlier projections would be unreliable. It multiplies the average cost per iteration by the remaining iteration budget and adds the cost already incurred.

CLI Display

During loop execution, cost statistics are displayed in the terminal:

Tokens: 45.2K (38.1K in / 7.1K out)
Cost: $0.221 ($0.018/iteration avg)
Projected max cost: $0.552

Token counts use shorthand notation:

  • Values under 1,000 are shown as-is (e.g., 847)
  • Values from 1,000 to 999,999 use K suffix (e.g., 45.2K)
  • Values above 1,000,000 use M suffix (e.g., 1.23M)

Cost values are formatted for readability:

  • Costs under $0.01 are shown with cent symbol (e.g., 0.50¢)
  • Costs from $0.01 to $0.99 show three decimal places (e.g., $0.221)
  • Costs $1.00 and above show two decimal places (e.g., $12.50)

Tips for Controlling Costs

Choose the Right Preset

Presets with lower maxIterations naturally cap spending:

PresetMax IterationsTypical Use
review10Read-only analysis
incident-response15Quick focused fix
feature-minimal20Simple implementation
debug20Investigation
feature30Standard development
tdd-red-green50Full TDD cycle

Use --max-iterations to Set Hard Limits

Override any preset's iteration cap:

# Cap at 10 iterations regardless of preset
ralph-starter run "add feature" --preset feature --max-iterations 10

Enable Circuit Breakers

Circuit breakers stop the loop when the agent is stuck, preventing token waste on repeated failures. See the Circuit Breaker documentation for details.

ralph-starter run "add feature" --circuit-breaker-failures 2 --circuit-breaker-errors 3

Use Cheaper Models

If your task does not require the most capable model, using a cheaper model reduces cost per iteration significantly. Claude 3.5 Haiku at $0.25/$1.25 per million tokens is 60x cheaper on input and 60x cheaper on output compared to Claude 3 Opus.

Use Rate Limiting

Rate limiting caps how fast the loop calls the API, giving you a natural spending brake. See the Rate Limiting documentation.

ralph-starter run "add feature" --rate-limit 30  # Max 30 calls/hour

Monitor the Projected Cost

Watch the Projected Max Cost in the activity.md file. If the projection exceeds your budget, you can stop the loop early (Ctrl+C) and restart with a lower --max-iterations or a different approach.

Cost Tracking Architecture

The cost tracker records each iteration independently:

  1. Input text (the prompt sent to the agent) is measured and converted to estimated input tokens.
  2. Output text (the agent's response) is measured and converted to estimated output tokens.
  3. Both are combined into a TokenEstimate (inputTokens, outputTokens, totalTokens).
  4. The estimate is multiplied by the model's pricing to produce a CostEstimate (inputCost, outputCost, totalCost).
  5. Each iteration's data is stored and used to compute running averages and projections.

The tracker instance is created when the loop starts (if trackCost is true) and is available throughout the loop's lifetime. At the end of the loop, the final statistics are included in the LoopResult.stats.costStats object.