all2md.search
Search subsystem exposed to the public API.
- class all2md.search.SearchOptions
Bases:
CloneFrozenMixinSearch configuration toggles used by the CLI and API.
- chunk_size_tokens: int = 320
- chunk_overlap_tokens: int = 40
- min_chunk_tokens: int = 60
- include_preamble: bool = True
- heading_merge: bool = True
- max_heading_level: int | None = None
- bm25_k1: float = 1.5
- bm25_b: float = 0.75
- vector_model_name: str = 'sentence-transformers/all-MiniLM-L6-v2'
- vector_batch_size: int = 32
- vector_device: str | None = None
- vector_normalize_embeddings: bool = True
- hybrid_keyword_weight: float = 0.5
- hybrid_vector_weight: float = 0.5
- default_mode: str = 'keyword'
- grep_context_before: int = 0
- grep_context_after: int = 0
- grep_regex: bool = False
- grep_ignore_case: bool = False
- grep_show_line_numbers: bool = False
- grep_max_columns: int = 150
- __init__(chunk_size_tokens: int = 320, chunk_overlap_tokens: int = 40, min_chunk_tokens: int = 60, include_preamble: bool = True, heading_merge: bool = True, max_heading_level: int | None = None, bm25_k1: float = 1.5, bm25_b: float = 0.75, vector_model_name: str = 'sentence-transformers/all-MiniLM-L6-v2', vector_batch_size: int = 32, vector_device: str | None = None, vector_normalize_embeddings: bool = True, hybrid_keyword_weight: float = 0.5, hybrid_vector_weight: float = 0.5, default_mode: str = 'keyword', grep_context_before: int = 0, grep_context_after: int = 0, grep_regex: bool = False, grep_ignore_case: bool = False, grep_show_line_numbers: bool = False, grep_max_columns: int = 150) None
- class all2md.search.SearchDocumentInput
Bases:
objectRepresents a document scheduled for indexing.
- source: str | Path | bytes
- document_id: str | None = None
- source_format: Literal['auto', 'archive', 'asciidoc', 'ast', 'bbcode', 'chm', 'csv', 'docx', 'dokuwiki', 'eml', 'enex', 'epub', 'fb2', 'html', 'ini', 'ipynb', 'jinja', 'json', 'latex', 'markdown', 'mbox', 'mediawiki', 'mhtml', 'odp', 'ods', 'odt', 'openapi', 'org', 'outlook', 'pdf', 'plaintext', 'pptx', 'rst', 'rtf', 'sourcecode', 'textile', 'toml', 'webarchive', 'xlsx', 'yaml', 'zip'] | str | None = 'auto'
- metadata: Mapping[str, object] | None = None
- __init__(source: str | Path | bytes, document_id: str | None = None, source_format: Literal['auto', 'archive', 'asciidoc', 'ast', 'bbcode', 'chm', 'csv', 'docx', 'dokuwiki', 'eml', 'enex', 'epub', 'fb2', 'html', 'ini', 'ipynb', 'jinja', 'json', 'latex', 'markdown', 'mbox', 'mediawiki', 'mhtml', 'odp', 'ods', 'odt', 'openapi', 'org', 'outlook', 'pdf', 'plaintext', 'pptx', 'rst', 'rtf', 'sourcecode', 'textile', 'toml', 'webarchive', 'xlsx', 'yaml', 'zip'] | str | None = 'auto', metadata: Mapping[str, object] | None = None) None
- class all2md.search.SearchService
Bases:
objectService object coordinating indexing and search execution.
Initialise the service with optional search configuration overrides.
- __init__(options: SearchOptions | None = None) None
Initialise the service with optional search configuration overrides.
- property state: SearchIndexState
Return the current in-memory search index state.
- build_indexes(documents: Sequence[SearchDocumentInput], *, modes: Iterable[SearchMode] | None = None, progress_callback: Callable[[ProgressEvent], None] | None = None) SearchIndexState
Convert sources into chunks and materialise requested indexes.
- save(directory: Path) None
Persist all active indexes to disk.
- classmethod load(directory: Path, options: SearchOptions | None = None) SearchService
Rehydrate indexes from disk.
- search(query: str, *, mode: SearchMode | str | None = None, top_k: int = 10, progress_callback: Callable[[ProgressEvent], None] | None = None) list[SearchResult]
Execute a query using the configured index backends.
- class all2md.search.SearchMode
Bases:
EnumEnumerate the supported search strategies.
- GREP = 1
- KEYWORD = 2
- VECTOR = 3
- HYBRID = 4
- class all2md.search.SearchResult
Bases:
objectSearch result referencing a chunk and its relevance score.
- chunk: Chunk
- score: float
- metadata: Mapping[str, Any]
- __init__(chunk: Chunk, score: float, metadata: Mapping[str, ~typing.Any]=<factory>) None
- all2md.search.build_search_service(documents: Sequence[SearchDocumentInput], *, options: SearchOptions | None = None, modes: Iterable[SearchMode] | None = None, progress_callback: Callable[[ProgressEvent], None] | None = None) SearchService
Create a search service pre-indexed with the supplied documents.
- all2md.search.search_with_service(service: SearchService, query: str, *, mode: SearchMode | str | None = None, top_k: int = 10, progress_callback: Callable[[ProgressEvent], None] | None = None) list[SearchResult]
Execute a search query using an existing
SearchService.
- all2md.search.search_documents(documents: Sequence[SearchDocumentInput], query: str, *, options: SearchOptions | None = None, modes: Iterable[SearchMode] | None = None, mode: SearchMode | str | None = None, top_k: int = 10, progress_callback: Callable[[ProgressEvent], None] | None = None) list[SearchResult]
Index documents and run a query in a single convenience call.
For search module documentation, see Advanced Modules.