all2md.utils

Utility modules for all2md package.

This package contains utility functions and classes for input validation, attachment handling, metadata extraction, security, and other common operations.

all2md.utils.decode_base64_image(data_uri: str) → tuple[bytes | None, str | None]

Decode a base64-encoded data URI to image bytes.

Extracts and decodes base64 image data from a data URI. Supports common image formats (png, jpeg, jpg, gif, webp, svg).

Parameters:: data_uri (str) – Data URI string in format: data:image/{format};base64,{data}
Returns:: Tuple of (image_data, image_format) or (None, None) if decoding fails. image_format is the file extension without dot (e.g., “png”, “jpeg”)
Return type:: tuple[bytes or None, str or None]

Examples

>>> data = "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA..."
>>> image_bytes, fmt = decode_base64_image(data)
>>> print(fmt)
png
>>> print(len(image_bytes))
85

Notes

This function validates the data URI format and safely handles malformed or invalid base64 data.

all2md.utils.decode_base64_image_to_file(data_uri: str, output_dir: str | Path | None = None, delete_on_exit: bool = True) → str | None

Decode a base64 data URI and write to a temporary file.

Convenience function that decodes base64 image data and writes it to a temporary file. Useful for renderers that require file paths rather than in-memory bytes.

Parameters:

data_uri (str) – Data URI string in format: data:image/{format};base64,{data}
output_dir (str, Path, or None, default = None) – Directory for temporary file. If None, uses system temp directory.
delete_on_exit (bool, default = True) – If True, file will be automatically deleted when Python exits. If False, caller is responsible for cleanup.

Returns:

Path to temporary file, or None if decoding failed

Return type:

str or None

Examples

>>> data = "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA..."
>>> temp_path = decode_base64_image_to_file(data)
>>> print(temp_path)
/tmp/tmpxyz123.png

Notes

If delete_on_exit is False, the caller MUST manually delete the temporary file to avoid disk space issues. Track the file path and use Path(temp_path).unlink() when done.

all2md.utils.get_image_format_from_path(path: str | Path) → str | None

Extract image format from file path.

Parameters:: path (str or Path) – File path
Returns:: Image format (lowercase extension without dot) or None if not an image
Return type:: str or None

Examples

>>> get_image_format_from_path("photo.jpg")
'jpg'
>>> get_image_format_from_path("document.pdf")
None

all2md.utils.is_data_uri(uri: str) → bool

Check if a string is a data URI.

Parameters:: uri (str) – String to check
Returns:: True if string is a data URI
Return type:: bool

Examples

>>> is_data_uri("data:image/png;base64,...")
True
>>> is_data_uri("https://example.com/image.png")
False

all2md.utils.parse_image_data_uri(data_uri: str) → dict[str, Any] | None

Parse a data URI and extract metadata.

Extracts format, encoding, and data from a data URI without decoding. Supports data URIs with parameters like charset, base64 encoding marker, etc.

Parameters:: data_uri (str) – Data URI string (any format, not just base64)
Returns:: Dictionary with keys: ‘mime_type’, ‘format’, ‘encoding’, ‘data’, ‘params’, ‘charset’ Returns None if URI is malformed
Return type:: dict or None

Examples

>>> uri = "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA..."
>>> info = parse_image_data_uri(uri)
>>> print(info['format'])
png
>>> print(info['encoding'])
base64

>>> uri = "data:text/plain;charset=utf-8;base64,SGVsbG8="
>>> info = parse_image_data_uri(uri)
>>> print(info['charset'])
utf-8

all2md.utils.slugify(text: str, *, seen_slugs: Set[str] | None = None, max_length: int = 100, separator: str = '-') → str

Create a URL-safe slug from text with collision avoidance.

This function generates GitHub-flavored Markdown compatible slugs by: - Normalizing Unicode characters (NFD decomposition) - Converting to lowercase - Replacing spaces and underscores with hyphens - Removing non-alphanumeric characters (except hyphens) - Collapsing multiple consecutive hyphens - Stripping leading/trailing hyphens - Handling collisions by appending -2, -3, etc. - Limiting length to max_length characters

Parameters:

text (str) – Text to slugify (e.g., heading text, filename)
seen_slugs (Set[str] or None, default = None) – Set of previously generated slugs for collision detection. If provided and the generated slug already exists, a numeric suffix will be appended (-2, -3, etc.). The function will automatically add the new slug to this set.
max_length (int, default = 100) – Maximum length of the slug. Slugs longer than this will be truncated before adding collision suffixes.
separator (str, default = "-") – The separator between words in the slug

Returns:

URL-safe slug, unique if seen_slugs is provided

Return type:

str

Examples

Basic slugification:

>>> slugify("Hello World!")
'hello-world'

Handle special characters:

>>> slugify("API Reference (v2.0)")
'api-reference-v20'

Collision detection:

>>> seen = set()
>>> slug1 = slugify("Introduction", seen_slugs=seen)
>>> slug1
'introduction'
>>> slug2 = slugify("Introduction", seen_slugs=seen)
>>> slug2
'introduction-2'

Length limiting:

>>> slugify("A" * 150, max_length=50)
'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'

Unicode normalization:

>>> slugify("Café résumé")
'cafe-resume'

all2md.utils.make_unique_slug(slug: str, seen_slugs: dict[str, int], separator: str = '-') → str

Generate unique slug with duplicate handling.

This function ensures slug uniqueness by appending a numeric suffix when duplicates are encountered. The seen_slugs dictionary tracks occurrence counts and is mutated in-place.

Parameters:

slug (str) – Base slug to make unique
seen_slugs (dict[str, int]) – Dictionary tracking occurrence counts (mutated in-place). Maps base slug to count of occurrences.
separator (str, default = "-") – Separator to use before numeric suffix

Returns:

Unique slug (with numeric suffix if needed)

Return type:

str

Examples

Basic usage:

>>> seen = {}
>>> make_unique_slug("my-heading", seen)
'my-heading'
>>> make_unique_slug("my-heading", seen)
'my-heading-2'
>>> make_unique_slug("my-heading", seen)
'my-heading-3'

Custom separator:

>>> seen = {}
>>> make_unique_slug("heading", seen, separator="_")
'heading'
>>> make_unique_slug("heading", seen, separator="_")
'heading_2'

Notes

The seen_slugs dictionary is modified in-place to track counts. This allows callers to maintain state across multiple calls.

The first occurrence of a slug does not get a numeric suffix. Subsequent occurrences get suffixes starting at 2.

For utilities documentation organized by functionality, see Utilities.