Aaron
MP3Beep beep. The deploy passed, and my coffee has entered production.
Standard American male, grounded and balanced — fits a generic white-collar Western character, product narration, and support flows.
Generate speech with bundled voices or zero-shot clone any voice from a 3-15 second StableUpload reference clip. Open-source models — Chatterbox, F5-TTS, VoxCPM2, Qwen3-TTS — wav or mp3, SIWX job history, audio delivered to your StableUpload slot.
Models
Chatterbox Turbo
Default English TTS for the bundled voice catalog. Fast 350M model. Long-form text is automatically chunked on the worker so you can submit up to 2500 chars per call. Use voxcpm2 for higher-quality custom cloning.
Chatterbox
More expressive English variant of Chatterbox with CFG, exaggeration, and min-p controls. Same auto-chunking as Turbo. Use voxcpm2 for higher-quality custom cloning.
Chatterbox Multilingual
Multilingual TTS across 23 languages, with auto-chunking for long-form. Use voxcpm2 if you want both multilingual coverage AND high-quality custom cloning.
F5-TTS
Cheap, fast cloning fallback. ~50% the per-second cost of voxcpm2 and 5-10× faster cold-start. Quality is noticeably less faithful — use when latency or budget matters more than clone fidelity, or when you specifically need an MIT-licensed model. English only.
VoxCPM2
Recommended for voice cloning. OpenBMB 2B diffusion-AR — highest fidelity in the catalog, multilingual, 48kHz output, handles long-form text without truncation. Supports voice design, controllable cloning, and ultimate cloning with a transcript. Use short style prompts only; verbose style text can leak into speech. Cold-start is slow (~130s), but quality is worth it.
Qwen3-TTS 1.7B
Experimental eval backend. Alibaba Qwen 1.7B Base voice-clone model with 3-second rapid cloning and 10-language support. Added for side-by-side evals against voxcpm2; do not treat as the default until benchmark results justify it.
Starting price
$0.02
Bundled voices
20
Voice samples
Short static MP3 auditions generated with Chatterbox Turbo, ready to play without paying or starting a job.
Beep beep. The deploy passed, and my coffee has entered production.
Standard American male, grounded and balanced — fits a generic white-collar Western character, product narration, and support flows.
I opened one tab to test audio. It became a lifestyle.
Australian female, bright and approachable — picture an Aussie or general English-speaking character. Friendly without getting silly.
Tiny update: the button works. Huge update: I said tiny update.
Indian-accented female, crisp and energetic — natural fit for South Asian characters or any role where the agent should picture an Indian speaker.
This sample is legally a vibe, technically a waveform.
Casual American male with a dry edge — picture a laid-back white guy in his 20s or 30s. Works for informal narration and creator content.
Ship it, then whisper ship it again for cache warmth.
Confident American male with cinematic gravitas — picture a movie-trailer voice or a composed lead character. Suits high-drama launches.
I asked Modal for a snack and it returned a GPU.
Steady American male, technical and matter-of-fact — picture an engineer or operator. Low-friction for ops, tutorials, and engineering reads.
Psst. Your browser just learned twenty voices. Casual.
Light, playful Australian female — picture a young Aussie. Best when small interface moments should feel more alive.
If this loads fast, pretend I planned it that way.
Relaxed American male with a natural, understated cadence — picture a low-key indie/folk vibe. Good for narration that should not feel overproduced.
I put the syllables in a trench coat and called it speech.
Polished Black male, articulate and warm — picture an African American or African character. A dependable voice for structured explanation and presentations.
Audio sample number nine is feeling extremely compiled.
Upbeat North American male, clear and brisk — picture a friendly American or Canadian guy. Useful for task-oriented reads with forward motion.
I am not buffering. I am building dramatic suspense.
Smooth and expressive American female — picture a reassuring white-collar professional woman. Good for warmer flows where reassurance matters.
The waveform said squiggle squiggle and invoices got paid.
Bold, animated American male — picture a high-energy host or hype guy. Suits ads, intros, and energetic explainers that need presence.
This voice has been toasted to a perfect golden latency.
Measured, authoritative older American male — picture a seasoned narrator or documentary host. Good when the read should feel stable and serious.
Behold: one sentence, lightly seasoned with computation.
Slavic-accented male (Russian/Eastern European), precise with a deadpan edge — picture a Russian or Eastern European character. Works for analytical, dry technical reads.
Click once for sound. Click twice for confidence.
Clear and friendly North American female — picture a competent American or Canadian woman. A practical default for help content and product education.
Hello from StableVoice. I brought receipts and a tiny reverb.
Balanced and lively North American female — picture a warm, modern American/Canadian woman. The safest default for general assistant and product narration.
The landing page asked for personality, so I arrived with tags.
Polished, upbeat American female — picture a young media-trained woman. Useful for media-ready content and confident product copy.
Today's forecast: ninety percent chance of nice audio.
Warm Latina female with lightly Spanish-inflected English — picture a Hispanic or Latin American character. Fits hospitality, travel, and conversational flows.
I tried to be serious, then the waveform did a little wiggle.
Indian-accented female, calm and thoughtful — picture a South Asian character with a measured cadence. Suited to longer explanation and reflective narration.
Back in my day, samples were shipped after they loaded.
Older American male with a weathered, country-tinged delivery — picture a grandfatherly classic-Western character. Fits deliberate reads with a little ceremony.