Voicebox
Voicebox is an open-source voice cloning desktop app that generates natural speech from text entirely offline.
Screenshots
About Voicebox
Voicebox is a privacy-first voice synthesis platform that runs locally on your machine without requiring cloud services or subscriptions. Powered by Qwen3-TTS technology, it delivers professional-grade voice cloning and text-to-speech capabilities while keeping all your data under your control. The application supports multiple voice samples to enhance cloning quality and naturalness, ensuring your audio projects maintain authenticity.
Designed for cross-platform compatibility, Voicebox works seamlessly on macOS, Windows, and Linux systems. It leverages hardware acceleration through Metal on Mac and CUDA on Windows and Linux, enabling fast local inference without sacrificing performance. Whether you're running inference on your local GPU or connecting to a remote machine, the application adapts to your workflow needs.
Beyond basic voice synthesis, Voicebox includes a comprehensive creative suite. The integrated stories editor lets you build multi-voice narratives using a timeline-based interface where you can arrange tracks, trim clips, and mix conversations. Combined with the built-in Whisper-powered audio transcription system, you can automatically extract reference text from voice samples, streamlining the entire voice cloning and content creation process.