haive.core.engine.document.loaders.sources.registry

Source registry with decorator-based registration.

This module provides a registry for document sources that maps: - File extensions to source classes - URL patterns to source classes - Schemes to source classes - Source classes to their associated loaders

The registry enables automatic source detection and loader selection.

Classes

LoaderMapping

Mapping of a loader to a source.

SourceRegistration

Complete registration info for a source.

SourceRegistry

Registry for document sources and their loaders.

Functions

register_source([name, file_extensions, mime_types, ...])

Decorator to register a source class.

Module Contents

class haive.core.engine.document.loaders.sources.registry.LoaderMapping[source]

Mapping of a loader to a source.

class haive.core.engine.document.loaders.sources.registry.SourceRegistration[source]

Complete registration info for a source.

class haive.core.engine.document.loaders.sources.registry.SourceRegistry[source]

Registry for document sources and their loaders.

Init .

Returns:

Add return description]

Return type:

[TODO

create_source(path, source_type=None, **kwargs)[source]

Create a source instance for a path.

Parameters:
  • path (str)

  • source_type (str | None)

Return type:

haive.core.engine.document.loaders.sources.source_base.BaseSource | None

find_source_for_path(path, analysis=None)[source]

Find the best source for a given path.

Parameters:
Return type:

SourceRegistration | None

get_loader_for_source(source, loader_name=None, preference=LoaderPreference.BALANCED)[source]

Get the best loader for a source.

Parameters:
Return type:

LoaderMapping | None

get_source_info(name)[source]

Get registration info for a source.

Parameters:

name (str)

Return type:

SourceRegistration | None

list_sources()[source]

List all registered source names.

Return type:

list[str]

register(name, source_class, file_extensions=None, mime_types=None, url_patterns=None, schemes=None, path_patterns=None, loaders=None, default_loader=None, priority=0, custom_matcher=None)[source]

Register a source with the registry.

Parameters:
Return type:

SourceRegistration

haive.core.engine.document.loaders.sources.registry.register_source(name=None, file_extensions=None, mime_types=None, url_patterns=None, schemes=None, path_patterns=None, loaders=None, default_loader=None, priority=0, custom_matcher=None)[source]

Decorator to register a source class.

Examples

@register_source(

name=”pdf”, file_extensions=[“.pdf”], mime_types=[“application/pdf”], loaders={

“fast”: “PyPDFLoader”, “quality”: {

“class”: “UnstructuredPDFLoader”, “quality”: “high”, “requires_packages”: [“unstructured”, “pdf2image”],

}, “ocr”: {

“class”: “PDFPlumberLoader”, “speed”: “slow”, “quality”: “high”, “best_for”: [“tables”, “complex_layouts”],

}

}, default_loader=”fast”, priority=10

) class PDFSource(LocalSource):

‘’’Source for PDF documents.’’’ pass

Parameters:
Return type:

collections.abc.Callable[[type[haive.core.engine.document.loaders.sources.source_base.BaseSource]], type[haive.core.engine.document.loaders.sources.source_base.BaseSource]]