v0.1.3 — Now Available

Control Your Computer With Your Voice

A local desktop voice agent that runs natively on your machine. Speak commands, get results. No clicking required.

3 Platforms
2 Vision Providers
Possibilities

Everything You Need

VoiceUse combines speech recognition, AI reasoning, and system control into a single seamless experience.

Wake Word & Hotkey

Activate with "Computer" or hold Right Ctrl. Voice Activity Detection knows when you stop speaking.

Fast STT

Groq Whisper transcribes your speech in milliseconds. Runs off the main thread so the UI never freezes.

Cross-Platform OS Control

Windows, macOS, and Linux support for window management, typing, screenshots, and multi-monitor setups.

Computer Vision

Click UI elements described in natural language. Powered by Codex CLI or Anthropic Computer Use API.

Speaks Back

edge-tts and pyttsx3 with multi-backend playback. Confirms actions, reports errors, and keeps you informed.

Safety Guard

Destructive actions trigger spoken confirmation. Keyword detection + allow-lists keep your system safe.

How It Works

A streamlined pipeline from voice to action.

01

Speak

Hold the hotkey or say the wake word. Speak naturally — VAD detects when you finish.

02

Transcribe

Groq Whisper converts speech to text with high accuracy, running asynchronously.

03

Reason

LLM orchestrator plans actions. Groq primary, with OpenAI or Cerebras fallback for reliability.

04

Act

Execute window commands, type text, take screenshots, or click UI elements via vision.

05

Respond

TTS speaks the result. You hear confirmations, errors, and status updates in real time.

Download VoiceUse

Choose your platform. No Python install required — these are standalone executables built with PyInstaller.

Prefer the command line? Install via pipx or uv:

pipx install "voice-computer-use-agent[all]"

Plugin Architecture

Replace or extend entire subsystems. Mix and match providers to suit your workflow.

Built-in

Grok Voice Plugin

Replace the entire STT→LLM→TTS pipeline with a single xAI Realtime API WebSocket connection. Stream 24 kHz PCM audio end-to-end for ultra-low latency voice interaction.

  • Server-side VAD with interruption support
  • Multiple voice personalities (Eve, Ara, Leo, Rex, Sal)
  • Same OS control tools as the default pipeline
  • grok-voice-think-fast-1.0 model

STT Providers

Groq Whisper (default)

LLM Providers

Groq, OpenAI, Cerebras

TTS Providers

edge-tts, pyttsx3

Vision Providers

Codex CLI, Anthropic

Safety First

Your system is protected by multiple layers of safeguards.

Spoken Confirmation

Before any destructive action — close, quit, delete, shutdown — the agent speaks a confirmation prompt and waits for your verbal response.

Keyword Detection

Configurable destructive keyword list: close, quit, delete, remove, kill, terminate, shutdown, reboot, format, rm -rf, and password entry.

Shell Allow-List

System commands run through an allow-list by default. Unknown commands are blocked with an error message, not executed silently.

Ready to Go Hands-Free?

Join the growing community of developers controlling their desktops with voice.