Works with OpenAI, Gemini, Grok, and more

Cut your LLM API costs by up to 80%

SemaCache is an intelligent caching proxy for LLM APIs. It returns cached responses when a semantically similar query has been seen before — saving you money on every repeated question.

Drop-in replacement — change one line
# Before — calling OpenAI directly
client = OpenAI(api_key="sk-...")

# After — just change the base URL
client = OpenAI(
    api_key="sc-your-key",
    base_url="https://api.semacache.io/v1"
)
Free tier included
No code changes required
Supports OpenAI, Gemini & Grok
~5ms
Exact match latency
~20ms
Semantic match latency
80%
Avg cost reduction
99.9%
Uptime SLA
Stop overpaying for LLM API calls

You're burning $81/mo on repeat queries

Up to 40% of your API calls return the same or similar answers. SemaCache intercepts them before they hit your LLM — so you only pay once.

Configure your usage

~$0.00/request at average token usage

25K
1K10K100K500K
40%
ConservativeMost teams: 30–60%Aggressive

Your monthly savings

$33

That's $390/year back in your pocket

40%

cost reduction

10K

free cache hits

8d

payback period

$81/mo$58/mo

Stop leaving $390/year on the table.

Pro pays for itself in 8 days. Cancel anytime. No risk.

Start Saving Now

Trusted by developers building with OpenAI, Gemini, and custom models. One line of code. Instant savings.

Features

Three tiers of intelligent caching

Every request flows through a fast pipeline: exact hash → semantic similarity → LLM passthrough. Each tier is cheaper and faster than calling the LLM directly.

Exact Match Cache

MD5 hash lookup in Redis. Identical queries return cached responses in under 5ms.

Semantic Match Cache

Gemini-powered embeddings with pgvector similarity search. Catches paraphrased queries automatically.

Multi-Provider Routing

One endpoint for OpenAI, Gemini, and xAI Grok. SemaCache auto-detects the provider from the model name and routes accordingly.

Encrypted Key Storage

Store your LLM API keys securely in the dashboard. AES-256 encrypted at rest — keys never leave our servers.

Real-Time Analytics

Dashboard with cache hit rates, latency metrics, cost savings, and daily request volume per API key.

OpenAI-Compatible API

Drop-in replacement for any OpenAI SDK client. Works with Python, JavaScript, Go, and every other language.

How it works

From request to response in milliseconds

01

Your app sends a request

Point your OpenAI client at SemaCache. Your app sends requests as usual — no code changes needed beyond changing the base URL.

02

Exact match check

We hash the query and check Redis. If the identical query was asked before, the cached response is returned in ~5ms.

03

Semantic similarity search

If no exact match, we embed the query with Gemini and search our pgvector index. Paraphrased queries like "What's France's capital?" match "Capital of France?" with high confidence.

04

LLM passthrough & cache

On full miss, we route to the correct provider (OpenAI, Gemini, or Grok based on model name), return the response, and cache it for future hits.

Live production benchmarks

Text, images, and video — all cached.

Every API call goes through the same three-tier pipeline. The first request generates and caches. Every repeat returns instantly — whether it’s a chat reply, a 4K image, or a generated video.

133×
Chat speedup
16×
Image speedup
76×
Video speedup
<1s
All cache hits

Measured end-to-end on production (Google Cloud Run), including full network round-trip. Chat: OpenAI GPT-4o Mini & Gemini 2.0 Flash. Image: OpenAI GPT Image 1 & Google Imagen 4.0. Video: Google Veo 2 & Veo 3. Same caching applies to xAI Grok and all other supported models.

Supported Models

Works with every major LLM provider

Built-in support for OpenAI, Gemini, xAI Grok, Imagen, and Veo. Plus register any OpenAI-compatible endpoint as a custom model.

O

OpenAI

Chat Completions

gpt-5.4gpt-5.4-minigpt-5.4-nanogpt-4.1gpt-4.1-minigpt-4.1-nanogpt-4ogpt-4o-minio3o3-minio4-mini
G

Google Gemini

Chat Completions

gemini-3.1-pro-previewgemini-3-flash-previewgemini-3.1-flash-lite-previewgemini-2.5-progemini-2.5-flashgemini-2.5-flash-lite
X

xAI Grok

Chat Completions

grok-4.20grok-4grok-4-fastgrok-3grok-3-minigrok-3-fast
I

Image Generation

OpenAI, Google, xAI

gpt-image-1.5gpt-image-1gpt-image-1-miniimagen-4.0-generate-001imagen-4.0-ultra-generate-001imagen-4.0-fast-generate-001grok-imagine-imagegrok-imagine-image-pro
V

Video Generation

Google Veo, xAI

veo-3.1-generate-previewveo-3.1-fast-generate-previewveo-3.1-lite-generate-previewveo-3.0-generate-001veo-3.0-fast-generate-001veo-2.0-generate-001grok-imagine-video
Pro & Enterprise

Bring your own model

Register any OpenAI-compatible endpoint — vLLM, Ollama, Together AI, Groq, Fireworks, or your own self-hosted model. SemaCache caches responses from custom models the same way it caches OpenAI and Gemini.

  • Register via dashboard or API — set base URL, model name, and auth
  • Full three-tier caching: exact → semantic → passthrough
  • Works with any provider that speaks OpenAI-compatible format
# Register "my-llama" in Dashboard → Custom Models
# Then use it like any built-in model

from openai import OpenAI

client = OpenAI(
  api_key="sc-your-key",
  base_url="https://api.semacache.io/v1"
)

response = client.chat.completions.create(
  model="my-llama",
  messages=[
    {"role": "user", "content": "Hello!"}
  ]
)

Pricing

Start free, scale with confidence

Every plan includes multi-provider support and encrypted key storage.

Free

For experimentation and side projects

$0forever
  • 1,000 requests / month
  • 1 API key
  • Text + image caching
  • 7-day audit logs
  • Community support
Get Started
Most Popular

Pro

For developers shipping to production

$9/mo
  • 50,000 requests / month
  • 5 API keys
  • Text + image + video caching
  • Custom model registry
  • 30-day audit logs
  • Email support
Get Started

Enterprise

For teams at scale

$39/mo
  • 500,000 requests / month
  • Unlimited API keys
  • Text + image + video caching
  • Custom model registry
  • 90-day audit logs
  • Priority support
Contact Sales

Ready to cut your LLM costs?

Get started in under a minute. No credit card required. Change one line of code and start saving.