all2md.search

Search subsystem exposed to the public API.

class all2md.search.SearchOptions

Bases: CloneFrozenMixin

Search configuration toggles used by the CLI and API.

chunk_size_tokens: int = 320

chunk_overlap_tokens: int = 40

min_chunk_tokens: int = 60

include_preamble: bool = True

heading_merge: bool = True

max_heading_level: int | None = None

bm25_k1: float = 1.5

bm25_b: float = 0.75

vector_model_name: str = 'sentence-transformers/all-MiniLM-L6-v2'

vector_batch_size: int = 32

vector_device: str | None = None

vector_normalize_embeddings: bool = True

hybrid_keyword_weight: float = 0.5

hybrid_vector_weight: float = 0.5

default_mode: str = 'keyword'

grep_context_before: int = 0

grep_context_after: int = 0

grep_regex: bool = False

grep_ignore_case: bool = False

grep_show_line_numbers: bool = False

grep_max_columns: int = 150

__init__(chunk_size_tokens: int = 320, chunk_overlap_tokens: int = 40, min_chunk_tokens: int = 60, include_preamble: bool = True, heading_merge: bool = True, max_heading_level: int | None = None, bm25_k1: float = 1.5, bm25_b: float = 0.75, vector_model_name: str = 'sentence-transformers/all-MiniLM-L6-v2', vector_batch_size: int = 32, vector_device: str | None = None, vector_normalize_embeddings: bool = True, hybrid_keyword_weight: float = 0.5, hybrid_vector_weight: float = 0.5, default_mode: str = 'keyword', grep_context_before: int = 0, grep_context_after: int = 0, grep_regex: bool = False, grep_ignore_case: bool = False, grep_show_line_numbers: bool = False, grep_max_columns: int = 150) → None

class all2md.search.SearchDocumentInput

Bases: object

Represents a document scheduled for indexing.

source: str | Path | bytes

document_id: str | None = None

source_format: Literal['auto', 'archive', 'asciidoc', 'ast', 'bbcode', 'chm', 'csv', 'docx', 'dokuwiki', 'eml', 'enex', 'epub', 'fb2', 'html', 'ini', 'ipynb', 'jinja', 'json', 'latex', 'markdown', 'mbox', 'mediawiki', 'mhtml', 'odp', 'ods', 'odt', 'openapi', 'org', 'outlook', 'pdf', 'plaintext', 'pptx', 'rst', 'rtf', 'sourcecode', 'textile', 'toml', 'webarchive', 'xlsx', 'yaml', 'zip'] | str | None = 'auto'

metadata: Mapping[str, object] | None = None

__init__(source: str | Path | bytes, document_id: str | None = None, source_format: Literal['auto', 'archive', 'asciidoc', 'ast', 'bbcode', 'chm', 'csv', 'docx', 'dokuwiki', 'eml', 'enex', 'epub', 'fb2', 'html', 'ini', 'ipynb', 'jinja', 'json', 'latex', 'markdown', 'mbox', 'mediawiki', 'mhtml', 'odp', 'ods', 'odt', 'openapi', 'org', 'outlook', 'pdf', 'plaintext', 'pptx', 'rst', 'rtf', 'sourcecode', 'textile', 'toml', 'webarchive', 'xlsx', 'yaml', 'zip'] | str | None = 'auto', metadata: Mapping[str, object] | None = None) → None

class all2md.search.SearchService

Bases: object

Service object coordinating indexing and search execution.

Initialise the service with optional search configuration overrides.

__init__(options: SearchOptions | None = None) → None: Initialise the service with optional search configuration overrides.

property state: SearchIndexState: Return the current in-memory search index state.

build_indexes(documents: Sequence[SearchDocumentInput], *, modes: Iterable[SearchMode] | None = None, progress_callback: Callable[[ProgressEvent], None] | None = None) → SearchIndexState: Convert sources into chunks and materialise requested indexes.

save(directory: Path) → None: Persist all active indexes to disk.

static persisted_index_matches(directory: Path, documents: Sequence[SearchDocumentInput], options: SearchOptions) → bool

Return True if a persisted index in directory is still current.

Compares the fingerprint recorded at save time (corpus files + index-relevant options) against the fingerprint of the documents/options supplied now. A missing or unreadable manifest, or any mismatch, returns False — the caller should rebuild rather than serve a stale index.

classmethod load(directory: Path, options: SearchOptions | None = None) → SearchService: Rehydrate indexes from disk.

search(query: str, *, mode: SearchMode | str | None = None, top_k: int = 10, progress_callback: Callable[[ProgressEvent], None] | None = None) → list[SearchResult]: Execute a query using the configured index backends.

class all2md.search.SearchMode

Bases: Enum

Enumerate the supported search strategies.

GREP = 1

KEYWORD = 2

VECTOR = 3

HYBRID = 4

class all2md.search.SearchResult

Bases: object

Search result referencing a chunk and its relevance score.

chunk: Chunk

score: float

metadata: Mapping[str, Any]

__init__(chunk: Chunk, score: float, metadata: Mapping[str, ~typing.Any]=<factory>) → None

all2md.search.build_search_service(documents: Sequence[SearchDocumentInput], *, options: SearchOptions | None = None, modes: Iterable[SearchMode] | None = None, progress_callback: Callable[[ProgressEvent], None] | None = None) → SearchService: Create a search service pre-indexed with the supplied documents.

all2md.search.search_with_service(service: SearchService, query: str, *, mode: SearchMode | str | None = None, top_k: int = 10, progress_callback: Callable[[ProgressEvent], None] | None = None) → list[SearchResult]: Execute a search query using an existing SearchService.

all2md.search.search_documents(documents: Sequence[SearchDocumentInput], query: str, *, options: SearchOptions | None = None, modes: Iterable[SearchMode] | None = None, mode: SearchMode | str | None = None, top_k: int = 10, progress_callback: Callable[[ProgressEvent], None] | None = None) → list[SearchResult]: Index documents and run a query in a single convenience call.

For search module documentation, see Advanced Modules.