Getting Started
VoiceUse is a local desktop voice agent that controls your computer hands-free. Speak commands, and VoiceUse will transcribe, plan, execute, and respond — all running natively on your machine.
What VoiceUse Can Do
- Control windows — open, focus, minimize, resize, and move applications
- Type text — dictation into any text field
- Click UI elements — describe what you want to click in natural language
- Take screenshots — capture full screens or specific windows
- Execute system commands — run shell commands through an allow-list
- Browse the web — open URLs and navigate pages
Quick Start
1. Install
pipx install "voice-computer-use-agent[all]"
Or with uv:
uv tool install "voice-computer-use-agent[all]"
2. Set API Keys
export GROQ_API_KEY="gsk_..."
export OPENAI_API_KEY="sk-..." # optional fallback
export ANTHROPIC_API_KEY="sk-ant-..." # optional vision
3. Run
voiceuse
Hold Right Ctrl and speak, then release to submit. Or say "Computer" if wake word is enabled.
System Requirements
| Requirement | Details |
|---|---|
| Python | 3.10 or higher |
| OS | Windows (primary), Linux, macOS (best-effort) |
| Microphone | Required for voice input |
| API Keys | Groq required; OpenAI/Anthropic optional |
Architecture Overview
VoiceUse follows a modular pipeline architecture:
flowchart LR
User[User speaks] --> STT[STT<br/>Groq Whisper]
STT --> Brain[Brain<br/>LLM Orchestrator]
Brain --> Tools[Tool Registry]
Tools --> OS[OS Controller]
Tools --> Vision[Vision Bridge]
Brain --> TTS[TTS<br/>edge-tts/pyttsx3]
TTS --> User
Tip
Start with --dry-run to validate your setup without making API calls: voiceuse --dry-run
Next Steps
- Installation Guide — Detailed platform-specific setup
- Configuration — Customize behavior with
config.yaml - Usage — Learn how to use VoiceUse effectively
- Plugins — Extend with custom providers