# StableVoice API Base URL: `https://stablevoice.dev` Text-to-speech as an x402/MPP API. StableVoice runs open Chatterbox-family TTS models on Modal and writes generated audio to a StableUpload output slot. ## Models - `chatterbox-turbo` — default, fastest, English, supports paralinguistic tags like `[laugh]`, `[chuckle]`, `[sigh]`, `[gasp]`, `[cough]`. - `chatterbox` — English, more expressive controls: `exaggeration`, `cfgWeight`, `temperature`, `topP`, `minP`, `repetitionPenalty`. - `chatterbox-multilingual` — 23 languages: ar, da, de, el, en, es, fi, fr, he, hi, it, ja, ko, ms, nl, no, pl, pt, ru, sv, sw, tr, zh. Voice selection guide: Aaron: grounded for product narration; Abigail: bright for onboarding; Anaya: crisp for announcements; Andy: casual for demos; Archer: confident for trailers; Brian: steady for tutorials; Chloe: light for tips; Dylan: relaxed for podcasts; Emmanuel: polished for education; Ethan: upbeat for walkthroughs; Evelyn: expressive for storytelling; Gavin: bold for ads; Gordon: measured for training; Ivan: precise for technical explainers; Laura: clear for support; Lucy: balanced for default assistant; Madison: polished for promos; Marisol: warm for travel; Meera: thoughtful for long-form narration; Walter: classic for announcements Call `GET /api/voices` with SIWX for full `voiceGuide` descriptions, traits, and use cases. Use `referenceAudioUrl` instead of `voice` only when you have a rights-cleared custom reference clip. ## Workflow ``` 1. GET stablevoice.dev /api/voices # SIWX model + voice guide 2. Optional: GET /api/voice-samples # SIWX MP3 previews 3. POST stableupload.dev /api/upload # reserve wav/mp3 output slot 4. POST stablevoice.dev /api/speech # paid TTS compute 5. GET stablevoice.dev /api/jobs/{jobId} # SIWX poll every 2-5s ``` Reserve the StableUpload filename with the same extension as `format`. Keep `uploadUrl` or `postUrl/postFields` plus `publicUrl`; pass those as `output`. ## Endpoints - `GET /api/voices` — SIWX model catalog, bundled voices, `voiceGuide`, formats, tags, pricing notes. - `GET /api/voice-samples` — SIWX sample catalog with descriptions, traits, sample text, and absolute MP3 URLs. - `POST /api/speech` — paid TTS job. Body: `text` (1-2500), `model`, `voice`, `language`, `format`, `output`, optional `referenceAudioUrl`, `options`, `clientRequestId`. - `GET /api/jobs/{jobId}` — SIWX status. When complete, read `result.outputs.audio.publicUrl`. - `GET /api/jobs?cursor=...&limit=50` — SIWX job list. - `DELETE /api/jobs/{jobId}` — SIWX soft-delete from the job list; StableUpload object expiration is separate. For custom voice cloning, upload a clear 5-15 second reference clip to StableUpload and pass its `publicUrl` as `referenceAudioUrl`. Pricing starts at $0.02. Formula: max($0.02, estimatedGenerateSeconds * ($0.000306 A10 GPU + 4 CPU cores + 16 GiB memory) * 3.5).