Question 1

Should new projects use Responses API or Chat Completions?

Accepted Answer

Start with the Responses API for new projects. Chat Completions is still common for older apps, compatibility layers and high-migration-cost systems. Migration needs more than endpoint replacement: check input shape, tools, streaming and response parsing.

Question 2

What is the difference between OpenAI 429 rate limit and quota?

Accepted Answer

Rate limit usually means too many requests or tokens in a short window. Quota usually means credits, billing, monthly usage or project budget limits. Many “I still have balance” cases are actually TPM/RPM or shared model-family limits.

Question 3

Why can I get 429 even with balance left?

Accepted Answer

RPM, TPM, RPD, TPD, IPM or shared model-family limits may trigger before balance is exhausted. Inspect the response body and x-ratelimit-* headers before deciding whether to back off, lower concurrency or request higher limits.

Question 4

How do I fix insufficient_quota?

Accepted Answer

Check billing, usage limits, project budgets, organization limits and payment status. Plain retries usually do not help; recently changed billing or limits may also have propagation delay.

Question 5

What is x-request-id for?

Accepted Answer

It is the troubleshooting ID OpenAI generates for each request. Log it in error reports and user-copyable diagnostics; for 500/503, streaming interruptions or support escalation, it is safer than sharing full prompts.

Question 6

Are OpenAI-Organization and OpenAI-Project required?

Accepted Answer

Not always. They are useful when you have multiple organizations, multiple projects or legacy credentials and need deterministic routing, billing and quota attribution.

Question 7

Why do I get model_not_found or 404?

Accepted Answer

Common causes are stale model names, missing project permission, wrong endpoint, or mixing Responses API and Chat Completions parameters. Verify the official model page and project access first.

Question 8

Why do structured outputs return 400?

Accepted Answer

Common causes include unsupported JSON Schema features, incomplete required fields, incompatible additionalProperties settings, or mixing tool schemas with response_format. Start with a minimal schema and add fields gradually.

Question 9

Why did a streaming response cut off midway?

Accepted Answer

Separate provider errors from proxy timeouts, browser/edge runtime limits and clients that do not consume the SSE stream correctly. Log request IDs and consider shorter outputs or backend streaming relay.

Question 10

What should I do if an API key leaked or was committed to GitHub?

Accepted Answer

Revoke the old key immediately, create a new one, inspect usage and billing, rotate deployment environment variables, and check logs, frontend bundles and mobile apps for remaining copies. Do not only delete it from Git history.

Item	Value	Debug check
Base URL	`https://api.openai.com/v1`	Use this prefix for direct OpenAI API calls.
Responses API	`POST /v1/responses`	Start new projects here; supports text, tools, streaming and multimodal inputs.
Chat Completions	`POST /v1/chat/completions`	Common in older apps and compatibility layers; check Responses API for newer features.
Authentication	`Authorization: Bearer ...`	For 401, check credential, organization, project and forwarding proxy.
Organization/project	`OpenAI-Organization / OpenAI-Project`	Specify them when you use multiple organizations, projects or legacy credentials.
Request tracking	`x-request-id / X-Client-Request-Id`	Log request IDs in production; supply your own client request id for timeout/network cases.
Rate-limit headers	`x-ratelimit-*`	Record remaining requests/tokens and reset times to drive backoff.
Batch	`/v1/batches`	For non-real-time batch workloads; evaluate separately from realtime requests.

HTTP	Type / scenario	Meaning	Check first
`400`	`invalid_request_error`	Invalid parameter, message, tool or JSON shape.	Check endpoint, model, input/messages, tool schema and max tokens.
`401`	`invalid_authentication`	Authentication failed or credential is incorrect.	Check Bearer credential, organization/project headers, env vars and proxy forwarding.
`403`	`unsupported_country_region`	Country, region or access condition is unsupported.	Check account, region, network egress and organization permissions.
`404`	`not_found`	Endpoint, model or resource was not found.	Check URL, model name, file/batch/response id.
`409`	`conflict`	Resource state or concurrent operation conflict.	Re-read resource state and retry.
`429`	`rate_limit_exceeded`	Request or token rate limit hit.	Use x-ratelimit-* headers, exponential backoff and lower concurrency.
`429`	`insufficient_quota`	Credits, balance or monthly spend limit exhausted.	Check billing, usage limit, project budget and org limits.
`500`	`server_error`	Server-side error.	Log request id and retry with backoff.
`503`	`service_unavailable`	Temporary unavailability or high load.	Retry with backoff; optionally switch model or delay work.

Dimension	Form	Notes
RPM	`requests / minute`	Request-count dimension; bursts and concurrency often hit it first.
TPM	`tokens / minute`	Input, output and estimated tokens can all matter.
RPD / TPD	`daily limits`	For large usage, inspect project and organization limits.
Usage limits	`monthly spend`	429 quota errors are usually billing/spend-limit problems, not simple throttling.
Shared limits	`model family`	Some model families share limits, so switching within the family may not help.

OpenAI API Setup and Error Reference

FAQ