all2md Logo

Getting Started

  • Installation Guide
  • Quick Start Guide
  • Library Overview
  • Supported Formats

Core Workflows

  • Python API Workflows
  • Command Line Interface
  • Configuration Options
  • Attachment Handling
  • AST Transforms and Hooks
  • Working with the AST

Advanced Topics

  • Architecture Overview
  • Custom Templates with Jinja2
  • Static Site Generation
  • Security
  • Threat Model and Security Architecture
  • Performance Tuning
  • PDF Parsing Optimizations
  • Recipes and Cookbook

Integrations & Operations

  • MCP Server
  • Agent Skills
  • Plugin Development Guide
  • Framework Integrations
  • Configuration Files
  • Environment Variables
  • Troubleshooting
  • API Reference
    • Quick Links
      • Core API
      • Parsers
      • Renderers
      • Transforms
      • Linter
      • CLI Module
      • Utilities
      • Advanced Modules
        • AST (Abstract Syntax Tree)
        • Search Module
        • Diff Module
        • MCP Server
        • Packagers
        • Complete References
  • License
all2md
  • API Reference
  • Advanced Modules
  • all2md.search
  • View page source

all2md.search

Search subsystem exposed to the public API.

class all2md.search.SearchOptions

Bases: CloneFrozenMixin

Search configuration toggles used by the CLI and API.

chunk_size_tokens: int = 320
chunk_overlap_tokens: int = 40
min_chunk_tokens: int = 60
include_preamble: bool = True
heading_merge: bool = True
max_heading_level: int | None = None
bm25_k1: float = 1.5
bm25_b: float = 0.75
vector_model_name: str = 'sentence-transformers/all-MiniLM-L6-v2'
vector_batch_size: int = 32
vector_device: str | None = None
vector_normalize_embeddings: bool = True
hybrid_keyword_weight: float = 0.5
hybrid_vector_weight: float = 0.5
default_mode: str = 'keyword'
grep_context_before: int = 0
grep_context_after: int = 0
grep_regex: bool = False
grep_ignore_case: bool = False
grep_show_line_numbers: bool = False
grep_max_columns: int = 150
__init__(chunk_size_tokens: int = 320, chunk_overlap_tokens: int = 40, min_chunk_tokens: int = 60, include_preamble: bool = True, heading_merge: bool = True, max_heading_level: int | None = None, bm25_k1: float = 1.5, bm25_b: float = 0.75, vector_model_name: str = 'sentence-transformers/all-MiniLM-L6-v2', vector_batch_size: int = 32, vector_device: str | None = None, vector_normalize_embeddings: bool = True, hybrid_keyword_weight: float = 0.5, hybrid_vector_weight: float = 0.5, default_mode: str = 'keyword', grep_context_before: int = 0, grep_context_after: int = 0, grep_regex: bool = False, grep_ignore_case: bool = False, grep_show_line_numbers: bool = False, grep_max_columns: int = 150) → None
class all2md.search.SearchDocumentInput

Bases: object

Represents a document scheduled for indexing.

source: str | Path | bytes
document_id: str | None = None
source_format: Literal['auto', 'archive', 'asciidoc', 'ast', 'bbcode', 'chm', 'csv', 'docx', 'dokuwiki', 'eml', 'enex', 'epub', 'fb2', 'html', 'ini', 'ipynb', 'jinja', 'json', 'latex', 'markdown', 'mbox', 'mediawiki', 'mhtml', 'odp', 'ods', 'odt', 'openapi', 'org', 'outlook', 'pdf', 'plaintext', 'pptx', 'rst', 'rtf', 'sourcecode', 'textile', 'toml', 'webarchive', 'xlsx', 'yaml', 'zip'] | str | None = 'auto'
metadata: Mapping[str, object] | None = None
__init__(source: str | Path | bytes, document_id: str | None = None, source_format: Literal['auto', 'archive', 'asciidoc', 'ast', 'bbcode', 'chm', 'csv', 'docx', 'dokuwiki', 'eml', 'enex', 'epub', 'fb2', 'html', 'ini', 'ipynb', 'jinja', 'json', 'latex', 'markdown', 'mbox', 'mediawiki', 'mhtml', 'odp', 'ods', 'odt', 'openapi', 'org', 'outlook', 'pdf', 'plaintext', 'pptx', 'rst', 'rtf', 'sourcecode', 'textile', 'toml', 'webarchive', 'xlsx', 'yaml', 'zip'] | str | None = 'auto', metadata: Mapping[str, object] | None = None) → None
class all2md.search.SearchService

Bases: object

Service object coordinating indexing and search execution.

Initialise the service with optional search configuration overrides.

__init__(options: SearchOptions | None = None) → None

Initialise the service with optional search configuration overrides.

property state: SearchIndexState

Return the current in-memory search index state.

build_indexes(documents: Sequence[SearchDocumentInput], *, modes: Iterable[SearchMode] | None = None, progress_callback: Callable[[ProgressEvent], None] | None = None) → SearchIndexState

Convert sources into chunks and materialise requested indexes.

save(directory: Path) → None

Persist all active indexes to disk.

classmethod load(directory: Path, options: SearchOptions | None = None) → SearchService

Rehydrate indexes from disk.

search(query: str, *, mode: SearchMode | str | None = None, top_k: int = 10, progress_callback: Callable[[ProgressEvent], None] | None = None) → list[SearchResult]

Execute a query using the configured index backends.

class all2md.search.SearchMode

Bases: Enum

Enumerate the supported search strategies.

GREP = 1
KEYWORD = 2
VECTOR = 3
HYBRID = 4
class all2md.search.SearchResult

Bases: object

Search result referencing a chunk and its relevance score.

chunk: Chunk
score: float
metadata: Mapping[str, Any]
__init__(chunk: Chunk, score: float, metadata: Mapping[str, ~typing.Any]=<factory>) → None
all2md.search.build_search_service(documents: Sequence[SearchDocumentInput], *, options: SearchOptions | None = None, modes: Iterable[SearchMode] | None = None, progress_callback: Callable[[ProgressEvent], None] | None = None) → SearchService

Create a search service pre-indexed with the supplied documents.

all2md.search.search_with_service(service: SearchService, query: str, *, mode: SearchMode | str | None = None, top_k: int = 10, progress_callback: Callable[[ProgressEvent], None] | None = None) → list[SearchResult]

Execute a search query using an existing SearchService.

all2md.search.search_documents(documents: Sequence[SearchDocumentInput], query: str, *, options: SearchOptions | None = None, modes: Iterable[SearchMode] | None = None, mode: SearchMode | str | None = None, top_k: int = 10, progress_callback: Callable[[ProgressEvent], None] | None = None) → list[SearchResult]

Index documents and run a query in a single convenience call.

For search module documentation, see Advanced Modules.

Previous Next

© Copyright 2025, Tom Villani, Ph.D..

Built with Sphinx using a theme provided by Read the Docs.