# StableVoice API

Base URL: `https://stablevoice.dev`

Text-to-speech as an x402/MPP API. StableVoice runs open Chatterbox-family TTS models on Modal and writes generated audio to a StableUpload output slot.

## Models

- `chatterbox-turbo` — default, fastest, English, supports paralinguistic tags like `[laugh]`, `[chuckle]`, `[sigh]`, `[gasp]`, `[cough]`.
- `chatterbox` — English, more expressive controls: `exaggeration`, `cfgWeight`, `temperature`, `topP`, `minP`, `repetitionPenalty`.
- `chatterbox-multilingual` — 23 languages: ar, da, de, el, en, es, fi, fr, he, hi, it, ja, ko, ms, nl, no, pl, pt, ru, sv, sw, tr, zh.

Voice selection guide:
Aaron: grounded for product narration; Abigail: bright for onboarding; Anaya: crisp for announcements; Andy: casual for demos; Archer: confident for trailers; Brian: steady for tutorials; Chloe: light for tips; Dylan: relaxed for podcasts; Emmanuel: polished for education; Ethan: upbeat for walkthroughs; Evelyn: expressive for storytelling; Gavin: bold for ads; Gordon: measured for training; Ivan: precise for technical explainers; Laura: clear for support; Lucy: balanced for default assistant; Madison: polished for promos; Marisol: warm for travel; Meera: thoughtful for long-form narration; Walter: classic for announcements

Call `GET /api/voices` with SIWX for full `voiceGuide` descriptions, traits, and use cases. Use `referenceAudioUrl` instead of `voice` only when you have a rights-cleared custom reference clip.

## Workflow

```
1. GET  stablevoice.dev /api/voices          # SIWX model + voice guide
2. Optional: GET /api/voice-samples          # SIWX MP3 previews
3. POST stableupload.dev /api/upload         # reserve wav/mp3 output slot
4. POST stablevoice.dev /api/speech          # paid TTS compute
5. GET  stablevoice.dev /api/jobs/{jobId}    # SIWX poll every 2-5s
```

Reserve the StableUpload filename with the same extension as `format`. Keep `uploadUrl` or `postUrl/postFields` plus `publicUrl`; pass those as `output`.

## Endpoints

- `GET /api/voices` — SIWX model catalog, bundled voices, `voiceGuide`, formats, tags, pricing notes.
- `GET /api/voice-samples` — SIWX sample catalog with descriptions, traits, sample text, and absolute MP3 URLs.
- `POST /api/speech` — paid TTS job. Body: `text` (1-2500), `model`, `voice`, `language`, `format`, `output`, optional `referenceAudioUrl`, `options`, `clientRequestId`.
- `GET /api/jobs/{jobId}` — SIWX status. When complete, read `result.outputs.audio.publicUrl`.
- `GET /api/jobs?cursor=...&limit=50` — SIWX job list.
- `DELETE /api/jobs/{jobId}` — SIWX soft-delete from the job list; StableUpload object expiration is separate.

For custom voice cloning, upload a clear 5-15 second reference clip to StableUpload and pass its `publicUrl` as `referenceAudioUrl`.

Pricing starts at $0.02. Formula: max($0.02, estimatedGenerateSeconds * ($0.000306 A10 GPU + 4 CPU cores + 16 GiB memory) * 3.5).