vocabulary
This directory provides the infrastructure for managing personal vocabulary and dictation phrase replacements in Talon. It bridges the gap between how words are spoken and how they are written, enabling you to dynamically add new words, correct common misrecognitions, and format complex phrases on the fly.
Architectural Overview
The vocabulary system is divided into two main components:
- Vocabulary Expansion: Teaching Talon new words or specific spoken-to-written mappings. These are stored in vocabulary.talon-list and exposed to Talon's speech engine via the
user.vocabularylist. - Phrase Replacement: Post-processing recognized words to swap them with preferred alternatives (e.g., mapping a spoken phrase to a technical term). This is driven by vocabulary.py using a custom multi-word replacement engine.
Component Breakdown
vocabulary.py
This is the core engine of the vocabulary system. It implements:
PhraseReplacerClass: A greedy, prefix-tree-like pattern matching utility that replaces multi-word sequences in dictation. Unlike Talon's defaultdictate.word_mapwhich only supports single-word substitutions,PhraseReplacerindexes replacements by their first word and matches the longest possible spoken sequence first.- Dictation Override: The module overrides Talon's default
dictate.replace_wordsaction to route dictated text through the customPhraseReplacer. - Dynamic Persistence: Functions like
_add_selection_to_fileandappend_to_vocabularyhandle writing new entries back to physical files (vocabulary.talon-listorwords_to_replace.csv). It intelligently handles formatting, quoting, and noun/name inflections (automatically generating pluralsor possessive'svariants).
edit_vocabulary.talon
This file defines the voice interface for modifying your vocabulary on the fly. It functions in both command and dictation modes, allowing you to select text on your screen and issue commands to register it.
Key commands include:
copy to vocab [as <phrase>]: Adds the selected text to your vocabulary. If you provide a phrase, that phrase becomes the spoken form; otherwise, it uses the text itself (or spells it out if it is an acronym).copy noun to vocab [as <phrase>]/copy name to vocab [as <phrase>]: Automatically generates and saves both the base form and its plural or possessive counterpart.copy to replacements as <phrase>: Registers a post-processing replacement mapping the spoken phrase to the selected text.check vocab: Verifies if the selected text already exists in your vocabulary and notifies you of its spoken form.
vocabulary.talon-list
This is a flat-file repository where user-defined vocabulary words are stored. It populates the user.vocabulary list.
- Simple terms (e.g.,
nmap,admin) are listed directly. - Explicit spoken-to-written mappings are defined using key-value pairs (e.g.,
N map: nmap).
How the Components Work Together
[Selected Text on Screen]
│
▼ (Voice Command in edit_vocabulary.talon)
[vocabulary.py Action]
│
├─► (Appends vocabulary mapping) ──► [vocabulary.talon-list]
│
└─► (Appends replacement mapping) ─► [words_to_replace.csv] ──► [PhraseReplacer] ──► (Post-processes Dictation)
- Selection & Command: You select a word on your screen (e.g.,
nmap) and say "copy to vocab as N map". - Action Execution: The command in edit_vocabulary.talon catches the phrase and invokes
user.add_selection_to_vocabulary("N map")defined in vocabulary.py. - File Appending: The python script formats the key-value pair and appends
N map: nmapto vocabulary.talon-list. - Engine Reload: Talon detects the file change, updates the active
user.vocabularylist, and immediately recognizes the new word during future dictation.