haive.core.engine.document.processors¶
Document Processing Components.
This module provides document processing capabilities including chunking and content transformation that integrate with the DocumentEngine.
The processors handle: - Content normalization - Document chunking strategies - Metadata extraction - Format conversion
Classes¶
Processor for chunking documents into smaller pieces. |
|
Processor for normalizing document content. |
|
Base class for document processing operations. |
|
Processor for detecting document formats. |
|
Processor for extracting metadata from documents. |
Module Contents¶
- class haive.core.engine.document.processors.ChunkingProcessor(chunking_strategy=ChunkingStrategy.RECURSIVE, chunk_size=1000, chunk_overlap=200, **kwargs)[source]¶
Bases:
DocumentProcessorProcessor for chunking documents into smaller pieces.
Initialize the chunking processor.
- Parameters:
chunking_strategy (haive.core.engine.document.config.ChunkingStrategy) – Strategy for chunking
chunk_size (int) – Size of chunks in characters
chunk_overlap (int) – Overlap between chunks
**kwargs – Additional configuration
- class haive.core.engine.document.processors.ContentNormalizer(normalize_whitespace=True, remove_extra_newlines=True, strip_content=True, **kwargs)[source]¶
Bases:
DocumentProcessorProcessor for normalizing document content.
Initialize the content normalizer.
- Parameters:
- class haive.core.engine.document.processors.DocumentProcessor(**kwargs)[source]¶
Base class for document processing operations.
Initialize the processor.
- class haive.core.engine.document.processors.FormatDetector(**kwargs)[source]¶
Bases:
DocumentProcessorProcessor for detecting document formats.
Initialize the processor.
- class haive.core.engine.document.processors.MetadataExtractor(**kwargs)[source]¶
Bases:
DocumentProcessorProcessor for extracting metadata from documents.
Initialize the processor.