Gemini API Charges Growing Despite Disabled Billing

Overview

This guide explains why Google Cloud Gemini API charges can grow even after you disabled billing, how token-based pricing works, and exact steps to diagnose, stop, and dispute unexpected image-generation fees. It merges official pricing docs, community reports, and practical remediation steps so you can act quickly.

Why this happens: core findings

Charges are driven by tokens consumed, not by the number of requests.
Image generation has token costs: a 1024x1024 image consumes 1290 tokens (~$0.039 per image at $30/1M tokens).
Gemini CLI lacks built-in cost visibility, so high-volume or large-context calls can run up charges unnoticed.
The free tier "1,000 requests/day" limit doesn't prevent token-based charges because tokens per request vary widely.
Gemini 2.5 Pro's huge context window (up to 1–2M tokens) can balloon spending if used unintentionally.

Quick facts: pricing & billing mechanism

Image pricing example: $30 per 1,000,000 tokens; 1290 tokens per 1024x1024 image = ~$0.039 per image.
Billing is handled through Google Cloud Billing (Cloud Billing account links and project-level billing).
Two pricing tiers: free-of-charge tier and pay-as-you-go; actual rate limits and pricing vary by model.
You are charged only for successful requests returning HTTP 200; 4xx/5xx responses are not billed.

Real user reports: how charges appear unexpectedly

Developers in community discussions reported sudden, large bills tied to the Gemini CLI or programmatic image generation.

Common patterns:

Automated scripts or CI jobs calling the CLI repeatedly or with large prompts/context sizes.
Misunderstanding of the free tier: high-token calls far exceed the free budget despite staying within request count limits.
Background processes or shared keys used by multiple projects or teammates.

Takeaway

Unexpected billing usually comes down to token consumption plus low cost visibility.

How token pricing translates to real costs

Think of tokens like billing units. One request can consume anywhere from a few tokens to millions depending on the model and usage.

Model chosen (Gemini 2.5 Pro vs smaller models).
Context window size (long conversations or large images).
Image resolution and model-specific token rates.

Example math:

Price: $30 per 1,000,000 tokens => $0.00003 per token.
One 1024x1024 image = 1290 tokens => 1290 * $0.00003 = $0.0387 (~$0.039).

Step-by-step diagnosis: find the root cause

Check Cloud Billing console: In Google Cloud Console, open Billing > Transactions and filter by project and date range. Look for line items referencing the AI or Gemini product.
Inspect API logs: Go to Cloud Logging > Logs Explorer. Filter for the Gemini API service name and examine request payload sizes, response codes, and timestamps.
Filter for 200 responses: Since only 200 responses bill, filter logs to find successful calls and check token counts in request/response metadata if available.
Audit service accounts and API keys: List keys and service accounts with access to Gemini APIs. Check last-used timestamps to spot unexpected usage.
Review CLI usage: If you use the Gemini CLI, inspect shell history, CI pipelines, or cron jobs that might trigger repeated calls. The CLI currently offers limited cost visibility.
Check quotas: In IAM & Admin > Quotas, ensure your API quotas and limits align with expectations; note the 1,000 requests/day free tier vs token-based usage.

How to stop charges immediately

Disable or unbind billing account from affected project: This stops new charges at the project level. Note that some services may be impacted.
Revoke API keys/service account credentials: Delete or rotate any keys tied to unexpected usage to prevent further calls.
Disable the Gemini API: In the Cloud Console, go to APIs & Services and disable the Gemini API for the project.
Pause CI/automation: Temporarily halt pipelines or background jobs that call the API until you audit them.
Set budget alerts and quotas: Create an immediate budget with alert thresholds and set hard quotas where possible to prevent runaway spending.

How to monitor, prevent, and control future costs

Enable billing alerts and budgets: Create budgets with multiple notification thresholds (50%, 75%, 90%, 100%).
Set IAM restrictions: Limit which identities can call the Gemini API and enforce least privilege.
Use quotas and rate limits: Apply quotas at the project or API key level. While request-based quotas exist, combine them with token-aware controls in your app.
Instrument token counting: Log token usage per request within your application so you can monitor costs before the bill arrives.
Prefer smaller models or lower resolutions: For image generation, lower resolution or cheaper models drastically reduce token consumption.

Batch Mode: a practical cost-saving option

Processing many requests in Batch Mode can reduce overhead and reportedly cut costs by ~50% for bulk operations. If your workload allows batching (e.g., generating many images or running many similar prompts), group requests and submit them via Batch Mode to lower per-item token overhead.

Preparing to dispute or escalate a charge

If you believe charges are erroneous, prepare the following before contacting Google support:

Billing account ID and project IDs involved.
Dates and amounts of the unexpected charges.
Examples of log entries showing calls (timestamps, request IDs) and evidence you did not authorize or expect the calls.
Actions you already took (disabled APIs, revoked keys, set budgets).

Then contact Google Cloud Billing support with a clear timeline and attachments. Community reports indicate timely escalation helps — include request IDs from logs and note that only 200 responses are billable.

Sample support message (copy-paste friendly)

Subject: Unexpected Gemini API charges on project <PROJECT_ID> between <DATES>

We observed unexpected charges for Gemini image generation on project <PROJECT_ID> totaling <AMOUNT>. We have disabled the Gemini API and revoked keys. Attached are logs showing the request IDs and timestamps for successful (200) responses. Please investigate billing attribution and reverse charges if caused by a bug or misattribution. Thank you.

Best practices checklist to avoid surprises

Rotate and scope API keys; avoid embedding long-lived keys in public or CI logs.
Log token usage per request and enforce app-side token budgets.
Prefer lower-resolution images and non-Pro models for testing.
Use Batch Mode for bulk processing.
Set automated budgets and alerting in Cloud Billing.

FAQ

Q: Can I be charged after disabling billing?

A: Disabling billing unlinks the billing account from a project and typically stops new charges, but review active credits, pending invoices, and whether requests were processed before the change. Also ensure API keys were revoked so no new calls are accepted.

Q: Why does the free tier not stop my charges?

A: Free tiers often limit request counts, not tokens. A single request consuming many tokens can exceed free allowances even when request count is low.

Q: Are failed requests billed?

A: No. Google bills only successful 200 responses. 4xx and 5xx responses are not charged.

Resources and further reading

Closing thoughts

Unexpected Gemini API charges usually come down to token-driven billing plus limited visibility in tooling like the CLI. The fastest mitigation is to revoke keys, disable the API, and open a billing support case with precise logs. Longer term, add token-level monitoring, budgets, and quotas and consider Batch Mode for bulk jobs. If you need a checklist or sample log filters tailored to your setup, start by sharing the project and CLI usage pattern with support or your internal ops team.

Author: Morgan — DevOps Engineer & Problem Solver. Practical, hands-on advice for keeping cloud systems reliable and predictable.