community
This repository acts as the central, community-supported configuration and voice command set for the Talon Voice speech recognition system. By defining standard phonetic layouts, modifiers, formatting pipelines, and language-agnostic capability tags, the repository provides hands-free desktop management, web browsing, application integration, and high-productivity voice coding.
Dependencies and Usage
The repository integrates directly with several core runtimes, system APIs, and development libraries:
- Talon Voice Runtime (Conformer / wav2letter engine): The underlying speech engine that loads the repository's
.talonfiles,.talon-listvocabularies, and Python contexts. - Python (3.10 / 3.11): Powers all programmatic state management, window title tracking, math-to-coordinate calculations, and system-level scripting.
- Pytest: Utilized within the test-runner environment in test alongside isolated mock stubs to headlessly validate formatters, dictation engines, and spoken-form generators without running the closed-source Talon application.
- Pre-commit (ruff, oxfmt, talon-tools): Orchestrates local and automated continuous integration code-quality checks. It formats Python with
ruffand parses, lints, and formats Talon grammars and snippets usingtalon-fmtandsnippet-fmt. - External Integration APIs & RPCs:
- VS Code Talon Extension: Runs a local socket server within the IDE to execute file-based Remote Procedure Calls (RPC) managed by core/command_client.
- Voice Code Idea (JetBrains Plugin): Exposes a local HTTP REST server to execute API-driven IDE actions.
- Operating System APIs: Interlaces with Unix CLI pipelines, macOS Accessibility (
appscript), Windows Registry keys, and standard Windows API bindings (ctypes) to capture screenshots, query active windows, or control virtual desktops.
Directory Architecture
The repository is structured around a decoupled, context-aware command translation pipeline. The core directories handle standard speech inputs, abstract capability declarations, concrete application overrides, and language-specific coding behaviors.
Core Systems
- core: Establishes the foundational logic for the entire configuration. It manages keyboard abstractions, modifiers, and arrow mappings in core/keys; tracks active application processes in core/app_switcher; drives prose dictation, auto-spacing, and sentence boundary detection in core/text; and houses mathematical parsers in core/numbers. It also provides standard window-snapping algorithms in core/windows_and_tabs and interactive help overlays in core/help.
- tags: Declares the abstract "Tag-Action Pattern" interfaces. By mapping natural voice prompts to empty placeholder actions (e.g., standardizing browser tabs, find-and-replace dialogs, or debugger steps), this directory isolates the user's spoken syntax from application-specific implementations.
- apps: Maps the abstract definitions declared in tags or core to concrete application-specific keyboard shortcuts, terminal pipelines, or RPC endpoints. This includes deep integrations for developer environments like apps/vscode and apps/jetbrains, productivity suites like apps/thunderbird, and system file managers like apps/windows_explorer and apps/finder.
- lang: Organizes coding environments for more than two dozen programming and markup languages. It utilizes abstract syntax tags in lang/tags (managing math operations, variable assignments, and block structures) to route voice commands to syntax-correct outputs for languages such as lang/python, lang/rust, lang/typescript, lang/go, and lang/c.
- plugin: Houses rich user interface layers, alternative input managers, and non-verbal speech tools. Notable modules include plugin/subtitles for custom Skia-drawn subtitle displays, plugin/mode_indicator for on-screen status overlays, and plugin/repeater for executing mouth-noise-triggered command repetitions.
- test: Contains the suite of automated tests. By utilizing local test/stubs, it bypasses closed-source runtime boundaries to assert the correctness of formatters, text translation layers, and snippet processing systems.
- migration_helpers: Provides a startup utility script that translates legacy user configuration CSV files into high-performance, natively parsed
.talon-listfiles. - .github: Houses GitHub Actions workflows inside .github/workflows to execute test runners on pulls or merges, alongside automated dependency updates through Dependabot.
Configuration and Documentation Files
The root directory contains several core files that govern the setup, contribution standards, behavior modifications, and troubleshooting parameters:
- README.md: Serves as the primary documentation hub. It details multi-platform installation steps, teaches the phonetic alphabet, maps basic global modifiers, and acts as the entry point for configuring the voice environment.
- settings.talon: The central user customization file. It configures on-screen UI scaling factors (
imgui.scale), adjusts mouse scroll multipliers, controls mouse-grid layouts, sets double-pop timing bounds, and enables toggleable features such as unprefixed numbers or experimental window layouts. - PRACTICES.md: Establishes best practices for codebase maintenance. It outlines contribution guidelines, useless/unused code rules, recommended OS-specific file suffixes, when to leverage the unified snippet engine, and command serialization patterns.
- CONTRIBUTING.md: Documents core design principles. Key rules include P01 (preferring
[object][verb]spoken structures to prevent command-name pollution), P05 (preferring flat.talon-listfiles for user custom vocabularies), and P08 (persisting data safely within standard repository directories). - BREAKING_CHANGES.txt: A reverse-chronological log of backward-incompatible modifications in the repository. It alerts users when outdated symbol commands, edit verbs, or configuration parameters have been modified, preventing custom user files from breaking on update.
- Infrastructure Configuration:
pyproject.tomlconfigures testing dependencies and paths for Pytest and specifies formatting rules for Ruff..pre-commit-config.yamlruns multi-language formatters (includingoxfmtandtalon-fmt) to keep the repository syntactically uniform.requirements-dev.txtpinned testing library versions (pytest)..git-blame-ignore-revsensures large auto-formatting commits are ignored during git history lookups..editorconfigand.gitignoreenforce standard editor and tracking rules.
How the Components Work Together
The Talon Community codebase acts as a multi-stage request-routing pipeline, executing abstract actions through physical OS operations, terminal simulations, or external editor plugins.
[ Spoken Command ] (e.g., "select line", "state if", "op plus")
│
▼
[ Speech Recognition Engine ]
│
▼ (Resolves phonetics, letters, and modifiers)
[ core/keys | core/text ]
│
▼
[ Context-Aware Window Monitor ] ──► (Identifies focused window title / process)
│
┌────────┴────────┬────────────────────────┐
▼ ▼ ▼
[ apps/vscode ] [ apps/chrome ] [ lang/python ]
(Active Context) (Active Context) (Active Context)
│ │ │
▼ ▼ ▼
Invokes: Invokes: Invokes:
[ rpc_client ] [ keypresses ] [ snippets/formatters ]
(Triggers RPC) (Sends Ctrl-L) (Formats Snake Case)
1. The Input and Translation Phase
When you speak a command, Talon's speech engine uses mappings in core/keys (e.g., core/keys/mac or core/keys/win) to parse standard letters, numbers, and symbols.
If free-form dictation is spoken, core/text processes the string. It consults personal dictionaries in core/vocabulary and filters abbreviations in core/abbreviate to apply proper capitalizations and spacing on the fly.
2. Context Matching and Tag Binding
A background thread in Talon monitors active desktop process and window title events.
* If you focus Visual Studio Code, apps/vscode matches the process name and window title. It dynamically enables tags like user.tabs, user.splits, and user.multiple_cursors.
* If you open a .py file inside VS Code, the title tracker detects Python syntax, triggering lang/python which sets the variable naming formatter to SNAKE_CASE.
3. The Polymorphic Action Execution
When you speak an action command like "tab open", Talon executes the abstract action app.tab_open(). The active application context intercepts and translates this command:
* In Google Chrome, apps/chrome overrides the action to simulate pressing ctrl-t.
* In a terminal emulator like GNOME Terminal, apps/gnome_terminal overrides the action to send ctrl-shift-t to avoid interrupting command execution.
* In VS Code, apps/vscode intercepts the action and writes an RPC command request to the temporary directory. The core/command_client/rpc_client module handles write locks, triggers VS Code with a designated hotkey, and polls for a JSON confirmation file, cleanly renaming or deleting lock files when finished.
4. Language-Agnostic Snippet and Formatting Compilation
If you are coding and say "snip each", Talon looks up forEachStatement.snippet within the core/snippets/snippets directory.
* If you are editing a JavaScript file, the snippet engine compiles the templates using the javascript context:
javascript
array.forEach(element => { ... });
* If you are editing a Python file, the exact same voice trigger "snip each" compiles to:
python
for element in array:
This multi-layered architecture ensures that whether you are scrolling a webpage with your eyes and mouth noises, rearranging application split windows, or authoring code across multiple different languages, you only need to memorize a single, unified set of voice commands.