all2md.utils
Utility modules for all2md package.
This package contains utility functions and classes for input validation, attachment handling, metadata extraction, security, and other common operations.
- all2md.utils.decode_base64_image(data_uri: str) tuple[bytes | None, str | None]
Decode a base64-encoded data URI to image bytes.
Extracts and decodes base64 image data from a data URI. Supports common image formats (png, jpeg, jpg, gif, webp, svg).
- Parameters:
data_uri (str) – Data URI string in format: data:image/{format};base64,{data}
- Returns:
Tuple of (image_data, image_format) or (None, None) if decoding fails. image_format is the file extension without dot (e.g., “png”, “jpeg”)
- Return type:
tuple[bytes or None, str or None]
Examples
>>> data = "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA..." >>> image_bytes, fmt = decode_base64_image(data) >>> print(fmt) png >>> print(len(image_bytes)) 85
Notes
This function validates the data URI format and safely handles malformed or invalid base64 data.
- all2md.utils.decode_base64_image_to_file(data_uri: str, output_dir: str | Path | None = None, delete_on_exit: bool = True) str | None
Decode a base64 data URI and write to a temporary file.
Convenience function that decodes base64 image data and writes it to a temporary file. Useful for renderers that require file paths rather than in-memory bytes.
- Parameters:
data_uri (str) – Data URI string in format: data:image/{format};base64,{data}
output_dir (str, Path, or None, default = None) – Directory for temporary file. If None, uses system temp directory.
delete_on_exit (bool, default = True) – If True, file will be automatically deleted when Python exits. If False, caller is responsible for cleanup.
- Returns:
Path to temporary file, or None if decoding failed
- Return type:
str or None
Examples
>>> data = "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA..." >>> temp_path = decode_base64_image_to_file(data) >>> print(temp_path) /tmp/tmpxyz123.png
Notes
If delete_on_exit is False, the caller MUST manually delete the temporary file to avoid disk space issues. Track the file path and use Path(temp_path).unlink() when done.
- all2md.utils.get_image_format_from_path(path: str | Path) str | None
Extract image format from file path.
- Parameters:
path (str or Path) – File path
- Returns:
Image format (lowercase extension without dot) or None if not an image
- Return type:
str or None
Examples
>>> get_image_format_from_path("photo.jpg") 'jpg' >>> get_image_format_from_path("document.pdf") None
- all2md.utils.is_data_uri(uri: str) bool
Check if a string is a data URI.
- Parameters:
uri (str) – String to check
- Returns:
True if string is a data URI
- Return type:
bool
Examples
>>> is_data_uri("data:image/png;base64,...") True >>> is_data_uri("https://example.com/image.png") False
- all2md.utils.parse_image_data_uri(data_uri: str) dict[str, Any] | None
Parse a data URI and extract metadata.
Extracts format, encoding, and data from a data URI without decoding. Supports data URIs with parameters like charset, base64 encoding marker, etc.
- Parameters:
data_uri (str) – Data URI string (any format, not just base64)
- Returns:
Dictionary with keys: ‘mime_type’, ‘format’, ‘encoding’, ‘data’, ‘params’, ‘charset’ Returns None if URI is malformed
- Return type:
dict or None
Examples
>>> uri = "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA..." >>> info = parse_image_data_uri(uri) >>> print(info['format']) png >>> print(info['encoding']) base64
>>> uri = "data:text/plain;charset=utf-8;base64,SGVsbG8=" >>> info = parse_image_data_uri(uri) >>> print(info['charset']) utf-8
- all2md.utils.slugify(text: str, *, seen_slugs: Set[str] | None = None, max_length: int = 100, separator: str = '-') str
Create a URL-safe slug from text with collision avoidance.
This function generates GitHub-flavored Markdown compatible slugs by: - Normalizing Unicode characters (NFD decomposition) - Converting to lowercase - Replacing spaces and underscores with hyphens - Removing non-alphanumeric characters (except hyphens) - Collapsing multiple consecutive hyphens - Stripping leading/trailing hyphens - Handling collisions by appending -2, -3, etc. - Limiting length to max_length characters
- Parameters:
text (str) – Text to slugify (e.g., heading text, filename)
seen_slugs (Set[str] or None, default = None) – Set of previously generated slugs for collision detection. If provided and the generated slug already exists, a numeric suffix will be appended (-2, -3, etc.). The function will automatically add the new slug to this set.
max_length (int, default = 100) – Maximum length of the slug. Slugs longer than this will be truncated before adding collision suffixes.
separator (str, default = "-") – The separator between words in the slug
- Returns:
URL-safe slug, unique if seen_slugs is provided
- Return type:
str
Examples
- Basic slugification:
>>> slugify("Hello World!") 'hello-world'
- Handle special characters:
>>> slugify("API Reference (v2.0)") 'api-reference-v20'
- Collision detection:
>>> seen = set() >>> slug1 = slugify("Introduction", seen_slugs=seen) >>> slug1 'introduction' >>> slug2 = slugify("Introduction", seen_slugs=seen) >>> slug2 'introduction-2'
- Length limiting:
>>> slugify("A" * 150, max_length=50) 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
- Unicode normalization:
>>> slugify("Café résumé") 'cafe-resume'
- all2md.utils.make_unique_slug(slug: str, seen_slugs: dict[str, int], separator: str = '-') str
Generate unique slug with duplicate handling.
This function ensures slug uniqueness by appending a numeric suffix when duplicates are encountered. The seen_slugs dictionary tracks occurrence counts and is mutated in-place.
- Parameters:
slug (str) – Base slug to make unique
seen_slugs (dict[str, int]) – Dictionary tracking occurrence counts (mutated in-place). Maps base slug to count of occurrences.
separator (str, default = "-") – Separator to use before numeric suffix
- Returns:
Unique slug (with numeric suffix if needed)
- Return type:
str
Examples
Basic usage:
>>> seen = {} >>> make_unique_slug("my-heading", seen) 'my-heading' >>> make_unique_slug("my-heading", seen) 'my-heading-2' >>> make_unique_slug("my-heading", seen) 'my-heading-3'
Custom separator:
>>> seen = {} >>> make_unique_slug("heading", seen, separator="_") 'heading' >>> make_unique_slug("heading", seen, separator="_") 'heading_2'
Notes
The seen_slugs dictionary is modified in-place to track counts. This allows callers to maintain state across multiple calls.
The first occurrence of a slug does not get a numeric suffix. Subsequent occurrences get suffixes starting at 2.
For utilities documentation organized by functionality, see Utilities.