← Back to Projects
DesktopSystems2026Shipped

Lexo

Local-first Python desktop app and CLI for extracting text from digital PDFs, OCRing scanned documents, and preparing clean exports for research, NLP, and LLM workflows.

Turns PDFs and images into clean editable text with strong Burmese OCR support.

Technology Stack

PythonPySide6TyperPyMuPDFGoogle Drive APIPydanticuvGitHub Actions

What I Built

  • Built a shared engine powering both a PySide6 desktop GUI and scriptable Typer CLI.
  • Implemented smart OCR routing so digital PDFs use embedded text instantly while scanned pages route through Google Docs OCR.
  • Designed visual PDF tooling for cropping, splitting two-up spreads, rotating, merging, and extracting page ranges.
  • Added Burmese-aware text handling with Unicode NFC normalization, zero-width-space-safe cleaning, and bundled Myanmar font support.
  • Supported plain text, Markdown with YAML frontmatter, and JSONL exports for downstream NLP and LLM workflows.
  • Packaged and published the project with uv, Hatchling, GitHub Actions, and PyPI Trusted Publishing.

Demo

Demo assets are not published for this project yet.