haive.core.engine.document.loaders.sources.registry¶
Source registry with decorator-based registration.
This module provides a registry for document sources that maps: - File extensions to source classes - URL patterns to source classes - Schemes to source classes - Source classes to their associated loaders
The registry enables automatic source detection and loader selection.
Classes¶
Mapping of a loader to a source. |
|
Complete registration info for a source. |
|
Registry for document sources and their loaders. |
Functions¶
|
Decorator to register a source class. |
Module Contents¶
- class haive.core.engine.document.loaders.sources.registry.LoaderMapping[source]¶
Mapping of a loader to a source.
- class haive.core.engine.document.loaders.sources.registry.SourceRegistration[source]¶
Complete registration info for a source.
- class haive.core.engine.document.loaders.sources.registry.SourceRegistry[source]¶
Registry for document sources and their loaders.
Init .
- Returns:
Add return description]
- Return type:
[TODO
- create_source(path, source_type=None, **kwargs)[source]¶
Create a source instance for a path.
- Parameters:
- Return type:
haive.core.engine.document.loaders.sources.source_base.BaseSource | None
- find_source_for_path(path, analysis=None)[source]¶
Find the best source for a given path.
- Parameters:
path (str)
analysis (PathAnalysisResult | None)
- Return type:
SourceRegistration | None
- get_loader_for_source(source, loader_name=None, preference=LoaderPreference.BALANCED)[source]¶
Get the best loader for a source.
- Parameters:
source (haive.core.engine.document.loaders.sources.source_base.BaseSource)
loader_name (str | None)
preference (haive.core.engine.document.config.LoaderPreference)
- Return type:
LoaderMapping | None
- get_source_info(name)[source]¶
Get registration info for a source.
- Parameters:
name (str)
- Return type:
SourceRegistration | None
- register(name, source_class, file_extensions=None, mime_types=None, url_patterns=None, schemes=None, path_patterns=None, loaders=None, default_loader=None, priority=0, custom_matcher=None)[source]¶
Register a source with the registry.
- Parameters:
name (str)
source_class (type[haive.core.engine.document.loaders.sources.source_base.BaseSource])
default_loader (str | None)
priority (int)
custom_matcher (collections.abc.Callable[[haive.core.engine.document.loaders.path_analyzer.PathAnalysisResult], bool] | None)
- Return type:
- haive.core.engine.document.loaders.sources.registry.register_source(name=None, file_extensions=None, mime_types=None, url_patterns=None, schemes=None, path_patterns=None, loaders=None, default_loader=None, priority=0, custom_matcher=None)[source]¶
Decorator to register a source class.
Examples
- @register_source(
name=”pdf”, file_extensions=[“.pdf”], mime_types=[“application/pdf”], loaders={
“fast”: “PyPDFLoader”, “quality”: {
“class”: “UnstructuredPDFLoader”, “quality”: “high”, “requires_packages”: [“unstructured”, “pdf2image”],
}, “ocr”: {
“class”: “PDFPlumberLoader”, “speed”: “slow”, “quality”: “high”, “best_for”: [“tables”, “complex_layouts”],
}
}, default_loader=”fast”, priority=10
) class PDFSource(LocalSource):
‘’’Source for PDF documents.’’’ pass
- Parameters:
name (str | None)
default_loader (str | None)
priority (int)
custom_matcher (collections.abc.Callable[[haive.core.engine.document.loaders.path_analyzer.PathAnalysisResult], bool] | None)
- Return type:
collections.abc.Callable[[type[haive.core.engine.document.loaders.sources.source_base.BaseSource]], type[haive.core.engine.document.loaders.sources.source_base.BaseSource]]