Optimizing LLM Costs: A Step-by-Step Guide Using ProxAI’s Budget Controls

Onur Senoglu

August 26, 2025

Technician working on server computer while saying ProxAI in cartoon style

TL;DR: Not sure which AI agent is the best in the market? No need to worry! ProxAI is here to save you 🫵 from this tension. Use all of them 💪 as per your need. ProxAI let you switch between AI agent simultaneously and feed the data of one agent to another.

Budget Control with ProxAI Caching

Budget control is one of the most important aspects of any LLM-based job. If you’re not careful, you can easily lose a significant amount of time and money in the process—and nobody wants that.

The least we can do is avoid spending resources on rerunning tasks that are already completed and approved. That’s why caching is an integral part of every developer’s workflow. It saves both time and money.

Good news—ProxAI provides caching out of the box! 🎉 You can enable it simply by setting the cache_path parameter in the px.connect() function:


import proxai as px
 
px.connect(cache_path='~/proxai-cache')

ProxAI as Time and Energy Saver!

Most developers stick with a single AI provider, where caching can be stored effectively. However, relying on one provider for a large project creates unnecessary dependency. When you start working with multiple AI platforms—such as OpenAI, Google Gemini, or Anthropic Claude—the cache from one agent typically can’t be transferred to another.

This is where ProxAI comes in.

With the open-source ProxAI library, you can seamlessly work with 10+ AI providers and 100+ models—without changing a single line of code in your existing codebase. You can even run multiple AI agents simultaneously. This removes lock-in to any one provider and preserves cache data—even when switching between agents.

Example

Let’s work on a project to solve the function x^2 + 5x + c =0 with the bisection numerical method, here c ∈ [5,20]. In total, 16 iterations need to be made to calculate for each function.

Let’s break it down first, calculate for c ∈ [5,10].

First, make a directory


mkdir proxai-cache

Then, just simply copy its path by doing pwd or if you are using Google Colab, then copy path like this,

Google colab displaying project cache folder copy.

Then copy this path topx.connect(cache_path='/content/proxai-cache') and your cache directory is all set! 🤷🏻‍♂️

Now, let’s make the code with the Gemini model like this,


import proxai as px
import datetime
 
px.connect(cache_path='/content/proxai-cache')
 
for i in range(5,20):
  start_time = datetime.datetime.now()
  answer = px.generate_text(
      prompt=(
        'Solve the function $x^2 + 5x + c = 0$ with the bisection method for '
        f'c ={i} and write the final answers only.'),
      provider_model=('gemini', 'gemini-2.0-flash-lite'))
  end_time = datetime.datetime.now()
  print(f'{(end_time - start_time).total_seconds()} seconds: {answer}')

The output display time also it took for each run


0.424167 seconds: -1.20, -3.80
0.394279 seconds: -2, -3
0.351854 seconds: -1.70, -3.30
0.295458 seconds: -2, -3
0.353156 seconds: -2, -3

Now Let’s Rerun the Code for c ∈ [5,20]


import proxai as px
import datetime
 
px.connect(cache_path='/content/proxai-cache')
 
for i in range(5,20):
  start_time = datetime.datetime.now()
  answer = px.generate_text(
      prompt=(
        'Solve the function $x^2+5x+c = 0$ with the bisection method for '
        f'c = {i} and write the final answers only.'),
      provider_model=('gemini', 'gemini-2.0-flash-lite'))
  end_time = datetime.datetime.now()
  print(f'{(end_time - start_time).total_seconds()} seconds: {answer}')


0.005089 seconds: -1.20, -3.80
0.005334 seconds: -2, -3
0.004879 seconds: -1.70, -3.30
0.004824 seconds: -2, -3
0.004995 seconds: -2, -3
0.317696 seconds: -2.618, -2.382
0.321099 seconds: No real roots.
0.361351 seconds: -3, -2
0.29528 seconds: No real roots.
0.290966 seconds: -7, -2
0.34152 seconds: -5, -0
0.325062 seconds: -4, -1
0.346213 seconds: No real roots.
0.316079 seconds: -9, -2
0.305384 seconds: No real roots.

Notice something?

px.connect copied your output for c ∈ [5,10] from your cache memory, why spend time and money on the task that’s already been done. So, just simply store it in cache memory. To explore more advanced options on cache_system, visit https://www.proxai.co/proxai-docs/advanced/cache-system

Budget Savings and Caching Details with Multi AI Agents

Usage Metrics:

ProxAI Dashboard with query count metrics.

ProxAI Dashboard with usage price metrics.

60% cached call is a huge number. If there is no caching, you would spend more than double the time and money on the same project. Redundant jobs should be avoided as much as possible for a time and budget efficient project.

Key Takeaways

Uncontrolled LLM usage can quietly burn both time and money.
Re-running approved tasks is the biggest (and most fixable) leak.
Caching is your first line of defense.
ProxAI makes this simple: set a cache path and go.

Why ProxAI?

No lock-in: Swap providers without refactoring your app.
Portable caching: Keep and reuse cache across agents.
Multi-agent ready: Run multiple providers in parallel.
Minimal changes: Integrate once, scale across providers.
Time saver: Reduce repetitive work and speed up iterations.