Rate limits

Why a per-account rate limit?

Rate limits keep the API responsive for every customer. Without them a single misbehaving client — a tight loop accidentally hammering an endpoint — would degrade latency for everyone. The limit you’ll encounter is per account: every key on the same account shares the same budget. If you need higher throughput, get in touch via business@soundpiece.co.uk. The only 429 you should plan to handle is this one — the request rate limit. We manage processing capacity for you separately, so you won’t see 429s from us when our backend is busy — your operation just stays in processing longer until capacity frees up. See Async operations for that lifecycle.

The 429 response

When you exceed your rate limit you get:

HTTP/1.1 429 Too Many Requests
Retry-After: 17
Content-Type: application/json

{
  "detail": "Rate limit exceeded; retry in 17 seconds"
}

The Retry-After header tells you how many seconds to wait before retrying. Honour it — backing off sooner will just bounce off the limit again. We don’t currently return X-RateLimit-* headers on successful responses. If your integration needs proactive throttling on your side (rather than reactive on 429), let us know via business@soundpiece.co.uk.

Handling 429 in code

Always read Retry-After rather than using a fixed delay. Combine with exponential backoff as a fallback if the header is absent or set to 0:

import time
import requests

def request_with_retry(method, url, max_retries=5, **kwargs):
    for attempt in range(max_retries):
        response = requests.request(method, url, **kwargs)

        if response.status_code == 429:
            retry_after = int(response.headers.get("Retry-After", 2 ** attempt))
            time.sleep(retry_after)
            continue

        return response

    raise RuntimeError("Exceeded max retries due to rate limiting")

async function requestWithRetry(url, options, maxRetries = 5) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const res = await fetch(url, options);

    if (res.status === 429) {
      const retryAfter = parseInt(
        res.headers.get("Retry-After") ?? String(2 ** attempt),
        10,
      );
      await new Promise((r) => setTimeout(r, retryAfter * 1000));
      continue;
    }

    return res;
  }
  throw new Error("Exceeded max retries due to rate limiting");
}

Because PUT endpoints are idempotent on idempotency_key, retrying after a 429 is safe — you’ll either get the same operation you tried to create the first time, or a fresh one. You won’t double-create work. See Async operations for the idempotency contract.

Why no `429` for capacity?

Audio generation has a long tail of processing time — second to a minute for typical jobs. If we returned 429 every time our processing tier was busy, you’d have to retry repeatedly until capacity opened up, which is wasteful and fragile. Instead we queue work internally. When you PUT, we accept the operation immediately and return status: "processing". Behind the scenes we dispatch when capacity is available. Your polling continues to show processing until the job lands — no 429, no manual retry of the submit. This means the only places you should plan for 429 are submit time (request rate exceeded) — never during long-running operations.

Designing for sustained throughput

If you’re driving high volume:

Use webhooks instead of polling. Polling a hot operation every second consumes the same per-account rate budget your submits are using. See webhooks.
Implement a queue on your side when submitting in bursts. A token-bucket or leaky-bucket sized to your rate limit keeps the submit channel smooth.
Use one key per service so you can revoke independently, but remember they share the account’s rate budget.

For batch jobs at meaningful scale (thousands of operations a day, or sustained per-minute rates beyond your current limit), reach out to business@soundpiece.co.uk before you start — we can pre-arrange a higher limit and make sure the backend is sized appropriately.

Getting started

Core Concepts

Compliance

Why a per-account rate limit?

The 429 response

Handling 429 in code

Why no `429` for capacity?

Designing for sustained throughput

​Why a per-account rate limit?

​The 429 response

​Handling 429 in code

​Why no 429 for capacity?

​Designing for sustained throughput

Why a per-account rate limit?

The 429 response

Handling 429 in code

Why no `429` for capacity?

Designing for sustained throughput