Frequently Asked Questions
Technical details about Tambourine's architecture, customization, and capabilities.
What is Tambourine?
Tambourine is an open-source voice dictation platform built on
Pipecat. It provides a modular pipeline
for speech-to-text with LLM-based formatting, supporting 10+ STT/LLM providers or fully local inference with
Whisper and Ollama.
Can I run it completely offline/locally?
Yes. Use Whisper for local STT and Ollama for local LLM formatting. No API keys or internet required. Set
WHISPER_ENABLED=true and OLLAMA_BASE_URL in your .env file.
What's the architecture?
Tauri desktop app (Rust + React) connects via WebRTC to a Python server running Pipecat pipelines. The server
handles STT → LLM formatting → returns cleaned text. Runtime config via RTVI protocol.
How do I add a new STT/LLM provider?
Tambourine uses Pipecat's service abstraction. Add any
Pipecat-supported provider
by implementing the service interface. See
server/ for examples with Deepgram, Cartesia, Groq,
etc.
How customizable is the formatting?
Fully customizable. The LLM formatting uses editable prompts stored in settings. Modify filler word removal,
punctuation rules, backtracking behavior, or add your own custom logic. Personal dictionary supports
technical terms and proper nouns.
What's the tech stack?
Desktop: Tauri (Rust) + React + TypeScript
Server: Python + FastAPI + Pipecat
Communication: WebRTC (SmallWebRTC)
State: Zustand + XState
Validation: Zod + Pydantic
Server: Python + FastAPI + Pipecat
Communication: WebRTC (SmallWebRTC)
State: Zustand + XState
Validation: Zod + Pydantic