src
The src directory is the root of the source code for the repo-guide application. It contains the primary Python codebase responsible for analyzing repositories, communicating with Large Language Models (LLMs), managing token budgets, and programmatically building static documentation sites.
Subdirectories
- repo_guide: The core Python package for the application. It houses the CLI definitions, execution entry points, and the orchestration engine that coordinates git inspection, prompt construction, LLM interaction, and documentation generation.
Core Architecture and Execution Flow
The codebase inside repo_guide is designed as a structured pipeline that runs from command-line invocation to local documentation hosting or remote deployment.
1. Command-Line Entry
When a user executes repo-guide, Python utilizes standard package entry points:
__main__.pyexecutes the command-line interface defined incli.py.- The CLI layer is built on the
clicklibrary, offering extensive configuration options. Users can define model parameters (e.g., defaulting to Gemini models), apply strict token limits to control costs, resume partially generated runs, build static sites, or deploy directly to GitHub Pages.
2. The DocGenerator Pipeline
The underlying execution engine is the DocGenerator class defined within the CLI module. It orchestrates the complex task of repository analysis and documentation synthesis through several distinct steps:
- Repository Inspection: It uses
GitPythonto read the repository's status, track versioned files, and map local paths to remote hosting URLs (such as GitHub blob links). - File Filtering: To avoid feeding junk data into the LLM, it filters out binaries and unwanted paths. This is done by analyzing Git attributes or utilizing Google's
magikalibrary for deep file-type detection. - Hierarchical Prompt Construction: To summarize directories accurately, the generator operates bottom-up or top-down contextually. When analyzing a directory, it packages the contents of local files alongside the previously generated README summaries of its subdirectories. This hierarchical bundling ensures the LLM understands both local file contents and the broader subsystem context.
- LLM Interaction & Token Budgeting: The engine interacts with the
llmlibrary to prompt the configured model. It safely counts tokens usingtiktokento respect user-defined budgets and employs exponential backoff logic to gracefully handle rate limits or API errors. - MkDocs Generation: Once the markdown guides and READMEs are generated, the engine programmatically writes a
mkdocs.ymlconfiguration file and spawns local background threads (usingsubprocessandthreading) to host a live-reloading preview server. It also registers custom Python hooks utilizingbleachto ensure any HTML generated by the LLM is safely sanitized.