VoiceSmith FAQs

Question 1

How many voices are available, and can I customize my AI's voice?

Accepted Answer

VoiceSmith offers 54 distinct voices powered by Kokoro ONNX. You can easily customize your AI's voice by asking it to switch (e.g., "Switch to Nova"). Your chosen voice is automatically saved and persists across sessions.

Question 2

What is VoiceSmith and how does it enhance coding assistants?

Accepted Answer

VoiceSmith integrates fully offline text-to-speech (TTS) and speech-to-text (STT) into AI coding assistants like Claude Code, Cursor, and Codex via the Model Context Protocol (MCP). It gives your AI a "voice" and "ears," enabling natural spoken interactions without cloud dependencies.

Question 3

What are the primary system requirements for running VoiceSmith?

Accepted Answer

To run VoiceSmith, you need Python 3.11+, macOS or Linux (Windows is not yet supported), `espeak-ng`, `mpv`, and approximately 500MB of disk space for the models.

Question 4

Which coding environments and IDEs does VoiceSmith support?

Accepted Answer

VoiceSmith seamlessly integrates with Claude Code, Cursor, and Codex. It provides full multi-session voice support for Claude Code, allowing multiple parallel sessions with unique voices, and single-session functionality for Cursor and Codex.

Question 5

Does VoiceSmith ensure data privacy and work offline?

Accepted Answer

Absolutely. VoiceSmith is designed for complete privacy. All processing, including TTS (Kokoro ONNX), STT (faster-whisper), and voice activity detection (Silero VAD), occurs entirely locally on your machine. No data leaves your environment or requires cloud APIs.

VoiceSmith

VoiceSmith

Key Features

Use Cases

Key Features

Use Cases