all2md Documentationο
Welcome to all2md, a Python document conversion library for rapid, lightweight transformation of various document formats to Markdown. Designed specifically for LLMs and document processing pipelines.
Key Featuresο
π Rapid Conversion Pipelines β Optimised parsers and renderers for fast, reliable Markdown
π Smart Detection β Multi-stage format detection (extension, MIME, magic bytes) with graceful fallbacks
π Wide Format Coverage β 20+ document, markup, and archive formats plus 200+ source-code/flat text types
βοΈ Dynamic Configuration β Dataclass-driven options, presets, and CLI/env overrides for every converter
πΌοΈ Attachment Management β Unified system for downloading, embedding, or annotating images and binaries
π§ AST Transforms β Hookable transformation pipeline with built-in TOC generation, boilerplate removal, and plugins
β
Document Linter β 47 built-in rules across structure, headings, links, lists, tables, images, and typography, with safe auto-fixes (lint --fix), JSON/text output, and CI-friendly exit codes
π¨ Custom Templates β Jinja2-based rendering for custom output formats (DocBook, YAML, ANSI terminal, etc.) without writing Python
π§ Rich CLI Toolkit β Batch processing, watch mode, parallel workers, collated output, and themed Rich terminals
π€ Integrations Ready β MCP server, plugin entry points, static-site templating, and bidirectional conversion APIs
Quick Exampleο
from all2md import to_markdown
# Convert any document to Markdown
markdown = to_markdown('document.pdf')
print(markdown)
# With custom options
from all2md.options import PdfOptions
options = PdfOptions(
pages=[1, 2, 3], # First 3 pages only
attachment_mode='save',
attachment_output_dir='./images'
)
markdown = to_markdown('document.pdf', parser_options=options)
# With AST transforms
from all2md.transforms import RemoveImagesTransform, HeadingOffsetTransform
markdown = to_markdown(
'document.pdf',
transforms=[
RemoveImagesTransform(),
HeadingOffsetTransform(offset=1)
]
)
Command Line Usageο
# Convert any document to markdown
all2md document.pdf
# Save output to file
all2md document.docx --out output.md
# Download images to a directory
all2md document.html --attachment-mode save --attachment-output-dir ./images
Supported Formatsο
- Documents
PDF, Word (DOCX), PowerPoint (PPTX), HTML/MHTML, Email (EML), EPUB, RTF, OpenDocument (ODT/ODP with bidirectional support)
- Data & Other
Excel (XLSX), CSV/TSV, Jupyter Notebooks (IPYNB), Archives (TAR/7Z/RAR/ZIP), Images (PNG/JPEG/GIF), 200+ text formats
- Markup
Markdown, reStructuredText, AsciiDoc, Org-Mode, MediaWiki, LaTeX, OpenAPI/Swagger, Textile
Getting Startedο
New to all2md? Start here:
Installation Guide - Install all2md with the formats you need
Quick Start Guide - Get up and running in 5 minutes
Library Overview - Understand the library architecture and capabilities
Guides & Referencesο
Getting Started
- Installation Guide
- Requirements
- Quick Install
- Installing with uv
- System-Level CLI Installation
- Optional Dependencies
- Combined Installations
- Dependency Management
- Development Installation
- Virtual Environment Setup
- Verification
- Common Installation Issues
- Upgrade Installation
- Uninstall
- System-Specific Notes
- Docker Installation
- Next Steps
- Quick Start Guide
- Library Overview
- Supported Formats
Core Workflows
- Python API Workflows
- Core Workflow
- Quick Start
- Convenience Helpers and AST Access
- Markdown to Word (DOCX)
- Markdown to HTML
- Markdown to PDF
- Advanced Workflows
- Supported Features by Format
- Best Practices
- Markdown to EPUB
- Markdown to PowerPoint (PPTX)
- Markdown to reStructuredText (RST)
- Custom Formats with Jinja2 Templates
- Document Linting
- See Also
- API Reference
- Command Line Interface
- Basic Usage
- Discovery Commands
- Search Command
- Grep Command
- Diff Command
- Lint Command
- Static Site Generation
- Document Viewing
- Document Outline
- Extracting by Line Range
- Document Serving
- Document Editing
- Global Options
- Format-Specific Options
- Batch Processing
- Dependency Management
- Format Detection and Planning
- Practical Examples
- Error Handling
- Shell Integration
- Shell Completion
- Install Skills Command
- LLM Help Command
- LLM Minify Command
- Context Menu Command (Windows)
- Command Aliases
- Configuration Options
- Attachment Handling
- AST Transforms and Hooks
- Working with the AST
Advanced Topics
- Architecture Overview
- Custom Templates with Jinja2
- Static Site Generation
- Overview
- Quick Start
- The generate-site Command
- Hugo Static Sites
- Jekyll Static Sites
- MkDocs Static Sites
- Zola Static Sites
- Eleventy Static Sites
- Frontmatter Generation
- Asset Management
- Output File Naming
- Batch Processing
- Scaffolding
- Complete Examples
- API Reference
- Comparison with HTML Templating
- Best Practices
- See Also
- Security
- Threat Model and Security Architecture
- Performance Tuning
- PDF Parsing Optimizations
- Recipes and Cookbook
- Processing Mixed Document Collections
- LLM Training Data Pipeline
- Secure Document Processing
- Advanced Batch Processing
- Complex Format Combinations
- Security-Focused Workflows
- LLM Data Preparation
- Dependency Management
- AST-Based Analysis and Transformation
- Developer Workflows
- Creating Custom Output Formats with Jinja2 Templates
Integrations & Operations
API Referenceο
Comprehensive documentation for all public modules and classes.
Aboutο
all2md is developed by Tom Villani, Ph.D. and released under the MIT License.
Source Code: https://github.com/thomas-villani/all2md
License: MIT License