all2md Documentation

Welcome to all2md, a Python document conversion library for rapid, lightweight transformation of various document formats to Markdown. Designed specifically for LLMs and document processing pipelines.

Python Version License: MIT PyPI version

Key Features

πŸš€ Rapid Conversion Pipelines – Optimised parsers and renderers for fast, reliable Markdown

πŸ” Smart Detection – Multi-stage format detection (extension, MIME, magic bytes) with graceful fallbacks

πŸ“„ Wide Format Coverage – 20+ document, markup, and archive formats plus 200+ source-code/flat text types

βš™οΈ Dynamic Configuration – Dataclass-driven options, presets, and CLI/env overrides for every converter

πŸ–ΌοΈ Attachment Management – Unified system for downloading, embedding, or annotating images and binaries

🧠 AST Transforms – Hookable transformation pipeline with built-in TOC generation, boilerplate removal, and plugins

βœ… Document Linter – 47 built-in rules across structure, headings, links, lists, tables, images, and typography, with safe auto-fixes (lint --fix), JSON/text output, and CI-friendly exit codes

🎨 Custom Templates – Jinja2-based rendering for custom output formats (DocBook, YAML, ANSI terminal, etc.) without writing Python

🧭 Rich CLI Toolkit – Batch processing, watch mode, parallel workers, collated output, and themed Rich terminals

πŸ€– Integrations Ready – MCP server, plugin entry points, static-site templating, and bidirectional conversion APIs

Quick Example

from all2md import to_markdown

# Convert any document to Markdown
markdown = to_markdown('document.pdf')
print(markdown)

# With custom options
from all2md.options import PdfOptions

options = PdfOptions(
    pages=[1, 2, 3],  # First 3 pages only
    attachment_mode='save',
    attachment_output_dir='./images'
)
markdown = to_markdown('document.pdf', parser_options=options)

# With AST transforms
from all2md.transforms import RemoveImagesTransform, HeadingOffsetTransform

markdown = to_markdown(
    'document.pdf',
    transforms=[
        RemoveImagesTransform(),
        HeadingOffsetTransform(offset=1)
    ]
)

Command Line Usage

# Convert any document to markdown
all2md document.pdf

# Save output to file
all2md document.docx --out output.md

# Download images to a directory
all2md document.html --attachment-mode save --attachment-output-dir ./images

Supported Formats

Documents

PDF, Word (DOCX), PowerPoint (PPTX), HTML/MHTML, Email (EML), EPUB, RTF, OpenDocument (ODT/ODP with bidirectional support)

Data & Other

Excel (XLSX), CSV/TSV, Jupyter Notebooks (IPYNB), Archives (TAR/7Z/RAR/ZIP), Images (PNG/JPEG/GIF), 200+ text formats

Markup

Markdown, reStructuredText, AsciiDoc, Org-Mode, MediaWiki, LaTeX, OpenAPI/Swagger, Textile

Getting Started

New to all2md? Start here:

  1. Installation Guide - Install all2md with the formats you need

  2. Quick Start Guide - Get up and running in 5 minutes

  3. Library Overview - Understand the library architecture and capabilities

Guides & References

Core Workflows

Advanced Topics

API Reference

Comprehensive documentation for all public modules and classes.

About

all2md is developed by Tom Villani, Ph.D. and released under the MIT License.

Indices and tables