haive.agents.document_processing.agentΒΆ
Comprehensive Document Processing Agent.
This agent provides end-to-end document processing capabilities including: - Document fetching with ReactAgent + search tools - Auto-loading with bulk processing - Transform/split/annotate/embed pipeline - Advanced RAG features (refined queries, self-query, etc.) - State management and persistence
The agent integrates all existing Haive document processing components into a unified, powerful system for document-based AI workflows.
Examples
Basic document processing:
agent = DocumentProcessingAgent()
result = agent.process_query("Load and analyze reports from https://company.com/reports")
Advanced RAG with custom retrieval:
config = DocumentProcessingConfig(
retrieval_strategy="self_query",
query_refinement=True,
annotation_enabled=True,
embedding_model="text-embedding-3-large"
)
agent = DocumentProcessingAgent(config=config)
result = agent.process_query("Find all financial projections from Q4 2024")
Multi-source document processing:
sources = [
"/path/to/local/docs/",
"https://wiki.company.com/procedures",
"s3://bucket/documents/",
{"url": "https://api.service.com/docs", "headers": {"Authorization": "Bearer token"}}
]
agent = DocumentProcessingAgent()
result = agent.process_sources(sources, query="Extract key insights")
Author: Claude (Haive AI Agent Framework) Version: 1.0.0
ClassesΒΆ
Comprehensive document processing agent with full pipeline capabilities. |
|
Configuration for comprehensive document processing. |
|
Result from document processing operation. |
|
State for document processing operations. |
Module ContentsΒΆ
- class haive.agents.document_processing.agent.DocumentProcessingAgent(config=None, engine=None, name='document_processor')ΒΆ
Comprehensive document processing agent with full pipeline capabilities.
This agent provides a complete document processing pipeline including: 1. Document Discovery & Fetching (ReactAgent + search tools) 2. Auto-loading with bulk processing 3. Transform/split/annotate/embed pipeline 4. Advanced RAG features 5. State management and persistence
The agent integrates all existing Haive document processing components into a unified, powerful system for document-based AI workflows.
Initialize the document processing agent.
- Parameters:
config (DocumentProcessingConfig | None) β Configuration for document processing
engine (haive.core.engine.aug_llm.AugLLMConfig | None) β LLM engine configuration
name (str) β Agent name for identification
- async process_query(query, sources=None)ΒΆ
Process a query with comprehensive document processing pipeline.
- class haive.agents.document_processing.agent.DocumentProcessingConfig(/, **data)ΒΆ
Bases:
pydantic.BaseModelConfiguration for comprehensive document processing.
- Parameters:
data (Any)
- # Core Processing
- auto_loader_configΒΆ
Configuration for document auto-loading
- enable_bulk_processingΒΆ
Enable concurrent bulk document processing
- max_concurrent_loadsΒΆ
Maximum concurrent document loads
- # Search & Retrieval
- search_enabledΒΆ
Enable web search for document discovery
- search_depthΒΆ
Search depth for web queries (βbasicβ or βadvancedβ)
- retrieval_strategyΒΆ
Strategy for document retrieval
- retrieval_configΒΆ
Configuration for retrieval components
- # Query Processing
- query_refinementΒΆ
Enable query refinement for better results
- multi_query_enabledΒΆ
Enable multiple query variations
- query_expansionΒΆ
Enable query expansion techniques
- # Document Processing
- annotation_enabledΒΆ
Enable document annotation
- summarization_enabledΒΆ
Enable document summarization
- kg_extraction_enabledΒΆ
Enable knowledge graph extraction
- # RAG Configuration
- rag_strategyΒΆ
RAG strategy to use
- context_window_sizeΒΆ
Context window size for RAG
- chunk_sizeΒΆ
Chunk size for document splitting
- chunk_overlapΒΆ
Overlap between chunks
- # Embedding & Vectorization
- embedding_modelΒΆ
Embedding model to use
- vector_store_configΒΆ
Vector store configuration
- # Performance
- enable_cachingΒΆ
Enable document caching
- cache_ttlΒΆ
Cache time-to-live in seconds
- enable_streamingΒΆ
Enable streaming responses
- # Output
- structured_outputΒΆ
Enable structured output generation
- response_formatΒΆ
Format for agent responses
- include_sourcesΒΆ
Include source information in responses
- include_metadataΒΆ
Include processing metadata
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- class haive.agents.document_processing.agent.DocumentProcessingResult(/, **data)ΒΆ
Bases:
pydantic.BaseModelResult from document processing operation.
- Parameters:
data (Any)
- responseΒΆ
Main response content
- sourcesΒΆ
List of source documents used
- metadataΒΆ
Processing metadata
- documentsΒΆ
Processed documents
- query_infoΒΆ
Information about query processing
- timingΒΆ
Timing information
- statisticsΒΆ
Processing statistics
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- class haive.agents.document_processing.agent.DocumentProcessingState(/, **data)ΒΆ
Bases:
haive.core.schema.prebuilt.messages_state.MessagesStateState for document processing operations.
Extends MessagesState with document-specific fields for tracking document processing workflows.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
data (Any)