haive.core.engine.document.loaders.auto_registry¶

Auto-Registry System for Document Loaders.

from typing import Any This module provides automatic registration and discovery of all document loader sources and loaders. It scans the sources directory and automatically imports and registers all available source types without manual intervention.

The auto-registry ensures that all 230+ implemented loaders are automatically available when the system starts, providing a seamless developer experience.

Examples

Auto-register all sources:

from haive.core.engine.document.loaders import auto_register_all

# Automatically discover and register all sources
auto_register_all()

Check registration status:

from haive.core.engine.document.loaders import get_registration_status

status = get_registration_status()
print(f"Registered {status['total_sources']} sources")

Author: Claude (Haive Document Loader System) Version: 1.0.0

Classes¶

AutoRegistry

Automatic registry for document loader sources.

RegistrationInfo

Information about a registered source.

RegistrationStats

Statistics about the registration process.

Functions¶

auto_register_all()

Convenience function to auto-register all sources.

get_registration_status()

Get current registration status.

get_sources_by_category(category)

Get sources for a specific category.

list_available_sources()

List all available source types.

Module Contents¶

class haive.core.engine.document.loaders.auto_registry.AutoRegistry(registry=None)[source]¶

Automatic registry for document loader sources.

The AutoRegistry scans the sources directory and automatically discovers, imports, and registers all available source types. This eliminates the need for manual registration and ensures all implemented loaders are available.

Features:
  • Automatic module discovery and import

  • Source class detection and validation

  • Duplicate registration prevention

  • Error handling and reporting

  • Registration statistics and monitoring

  • Dependency tracking

Examples

Basic auto-registration:

registry = AutoRegistry()
stats = registry.register_all_sources()
print(f"Registered {stats.total_sources_registered} sources")

With custom filters:

registry = AutoRegistry()
stats = registry.register_sources_by_category(SourceCategory.LOCAL_FILE)

Initialize the AutoRegistry.

Parameters:

registry – Optional custom registry instance

discover_source_modules()[source]¶

Discover all source modules in the sources directory.

Returns:

List of module names to import

Return type:

list[str]

Examples

Find all source modules:

registry = AutoRegistry()
modules = registry.discover_source_modules()
print(f"Found {len(modules)} source modules")
find_source_classes(module)[source]¶

Find all source classes in a module.

Parameters:

module (Any) – Imported module to scan

Returns:

List of (class_name, class_type) tuples

Return type:

list[tuple[str, type[haive.core.engine.document.loaders.sources.source_types.BaseSource]]]

Examples

Find sources in module:

registry = AutoRegistry()
module = registry.import_source_module("...")
classes = registry.find_source_classes(module)
print(f"Found {len(classes)} source classes")
get_registration_status()[source]¶

Get current registration status and statistics.

Returns:

Dictionary with registration information

Return type:

dict[str, Any]

Examples

Check registration status:

registry = AutoRegistry()
status = registry.get_registration_status()

print(f"Total sources: {status['total_sources']}")
print(f"Categories: {status['categories_count']}")
print(f"Recent registrations: {status['recent_registrations']}")
get_source_info(source_name)[source]¶

Get detailed information about a registered source.

Parameters:

source_name (str) – Name of the source to get info for

Returns:

RegistrationInfo or None if not found

Return type:

RegistrationInfo | None

Examples

Get source details:

registry = AutoRegistry()
info = registry.get_source_info("pdf")
if info:
    print(f"Module: {info.module_name}")
    print(f"Loaders: {info.loaders}")
import_source_module(module_name)[source]¶

Import a source module safely.

Parameters:

module_name (str) – Full module name to import

Returns:

Imported module or None if import failed

Return type:

Any | None

Examples

Import specific module:

registry = AutoRegistry()
module = registry.import_source_module(
    "haive.core.engine.document.loaders.sources.file_sources"
)
list_sources_by_category()[source]¶

List all registered sources grouped by category.

Returns:

Dictionary mapping categories to source lists

Return type:

dict[haive.core.engine.document.loaders.sources.source_types.SourceCategory, list[str]]

Examples

List sources by category:

registry = AutoRegistry()
by_category = registry.list_sources_by_category()

for category, sources in by_category.items():
    print(f"{category.value}: {', '.join(sources)}")
register_all_sources()[source]¶

Register all discovered sources automatically.

Returns:

RegistrationStats with detailed information about the process

Return type:

RegistrationStats

Examples

Auto-register everything:

registry = AutoRegistry()
stats = registry.register_all_sources()

print(f"Scanned: {stats.total_modules_scanned} modules")
print(f"Found: {stats.total_sources_found} sources")
print(f"Registered: {stats.total_sources_registered} sources")
print(f"Errors: {len(stats.registration_errors)}")
register_module_sources(module_name)[source]¶

Register all sources from a specific module.

Parameters:

module_name (str) – Module name to process

Returns:

Number of sources registered from this module

Return type:

int

Examples

Register all sources from file_sources module:

registry = AutoRegistry()
count = registry.register_module_sources(
    "haive.core.engine.document.loaders.sources.file_sources"
)
print(f"Registered {count} sources")
register_source_class(source_name, source_class, module_name)[source]¶

Register a single source class.

Parameters:
Returns:

True if registration was successful

Return type:

bool

Examples

Register single source:

registry = AutoRegistry()
success = registry.register_source_class(
    "pdf", PDFSource, "file_sources"
)
register_sources_by_category(category)[source]¶

Register sources from a specific category only.

Parameters:

category (haive.core.engine.document.loaders.sources.source_types.SourceCategory) – SourceCategory to register

Returns:

Number of sources registered

Return type:

int

Examples

Register only file sources:

registry = AutoRegistry()
count = registry.register_sources_by_category(SourceCategory.LOCAL_FILE)
print(f"Registered {count} file sources")
validate_all_registrations()[source]¶

Validate all registered sources.

Returns:

Validation report

Return type:

dict[str, Any]

Examples

Validate registrations:

registry = AutoRegistry()
report = registry.validate_all_registrations()
print(f"Valid: {report['valid_count']}")
print(f"Invalid: {report['invalid_count']}")
validate_source_class(source_class)[source]¶

Validate that a source class is properly configured.

Parameters:

source_class (type[haive.core.engine.document.loaders.sources.source_types.BaseSource]) – Source class to validate

Returns:

True if source class is valid

Return type:

bool

Examples

Validate source class:

registry = AutoRegistry()
valid = registry.validate_source_class(PDFSource)
print(f"Source valid: {valid}")
class haive.core.engine.document.loaders.auto_registry.RegistrationInfo[source]¶

Information about a registered source.

source_name¶

Name of the source type

source_class¶

The source class

module_name¶

Module where source is defined

category¶

Source category

loaders¶

Available loaders for this source

registration_time¶

When the source was registered

class haive.core.engine.document.loaders.auto_registry.RegistrationStats[source]¶

Statistics about the registration process.

total_modules_scanned¶

Number of modules scanned

total_sources_found¶

Number of source classes found

total_sources_registered¶

Number of sources successfully registered

registration_errors¶

List of errors encountered

registration_time¶

Total time taken for registration

categories_covered¶

Number of categories with registered sources

haive.core.engine.document.loaders.auto_registry.auto_register_all()[source]¶

Convenience function to auto-register all sources.

Returns:

RegistrationStats with detailed information

Return type:

RegistrationStats

Examples

Auto-register everything:

from haive.core.engine.document.loaders import auto_register_all

stats = auto_register_all()
print(f"Registered {stats.total_sources_registered} sources")
haive.core.engine.document.loaders.auto_registry.get_registration_status()[source]¶

Get current registration status.

Returns:

Dictionary with registration information

Return type:

dict[str, Any]

Examples

Check status:

from haive.core.engine.document.loaders import get_registration_status

status = get_registration_status()
print(f"Total sources: {status['total_sources']}")
haive.core.engine.document.loaders.auto_registry.get_sources_by_category(category)[source]¶

Get sources for a specific category.

Parameters:

category (haive.core.engine.document.loaders.sources.source_types.SourceCategory) – SourceCategory to filter by

Returns:

List of source names in the category

Return type:

list[str]

Examples

Get file sources:

from haive.core.engine.document.loaders import get_sources_by_category
from haive.core.engine.document.loaders.sources.source_types import SourceCategory

file_sources = get_sources_by_category(SourceCategory.LOCAL_FILE)
print(f"File sources: {file_sources}")
haive.core.engine.document.loaders.auto_registry.list_available_sources()[source]¶

List all available source types.

Returns:

List of source type names

Return type:

list[str]

Examples

List sources:

from haive.core.engine.document.loaders import list_available_sources

sources = list_available_sources()
print(f"Available: {', '.join(sources)}")