haive.agents.document_processing.agent ====================================== .. py:module:: haive.agents.document_processing.agent .. autoapi-nested-parse:: Comprehensive Document Processing Agent. This agent provides end-to-end document processing capabilities including: - Document fetching with ReactAgent + search tools - Auto-loading with bulk processing - Transform/split/annotate/embed pipeline - Advanced RAG features (refined queries, self-query, etc.) - State management and persistence The agent integrates all existing Haive document processing components into a unified, powerful system for document-based AI workflows. .. rubric:: Examples Basic document processing:: agent = DocumentProcessingAgent() result = agent.process_query("Load and analyze reports from https://company.com/reports") Advanced RAG with custom retrieval:: config = DocumentProcessingConfig( retrieval_strategy="self_query", query_refinement=True, annotation_enabled=True, embedding_model="text-embedding-3-large" ) agent = DocumentProcessingAgent(config=config) result = agent.process_query("Find all financial projections from Q4 2024") Multi-source document processing:: sources = [ "/path/to/local/docs/", "https://wiki.company.com/procedures", "s3://bucket/documents/", {"url": "https://api.service.com/docs", "headers": {"Authorization": "Bearer token"}} ] agent = DocumentProcessingAgent() result = agent.process_sources(sources, query="Extract key insights") Author: Claude (Haive AI Agent Framework) Version: 1.0.0 Classes ------- .. autoapisummary:: haive.agents.document_processing.agent.DocumentProcessingAgent haive.agents.document_processing.agent.DocumentProcessingConfig haive.agents.document_processing.agent.DocumentProcessingResult haive.agents.document_processing.agent.DocumentProcessingState Module Contents --------------- .. py:class:: DocumentProcessingAgent(config = None, engine = None, name = 'document_processor') Comprehensive document processing agent with full pipeline capabilities. This agent provides a complete document processing pipeline including: 1. Document Discovery & Fetching (ReactAgent + search tools) 2. Auto-loading with bulk processing 3. Transform/split/annotate/embed pipeline 4. Advanced RAG features 5. State management and persistence The agent integrates all existing Haive document processing components into a unified, powerful system for document-based AI workflows. Initialize the document processing agent. :param config: Configuration for document processing :param engine: LLM engine configuration :param name: Agent name for identification .. py:method:: get_capabilities() Get agent capabilities and configuration. .. py:method:: process_query(query, sources = None) :async: Process a query with comprehensive document processing pipeline. :param query: The user query to process :param sources: Optional list of specific sources to use :returns: DocumentProcessingResult with comprehensive results .. py:method:: process_sources(sources, query) :async: Process specific sources with a query. :param sources: List of sources to process :param query: Query to process against the sources :returns: DocumentProcessingResult with results .. py:class:: DocumentProcessingConfig(/, **data) Bases: :py:obj:`pydantic.BaseModel` Configuration for comprehensive document processing. .. attribute:: # Core Processing .. attribute:: auto_loader_config Configuration for document auto-loading .. attribute:: enable_bulk_processing Enable concurrent bulk document processing .. attribute:: max_concurrent_loads Maximum concurrent document loads .. attribute:: # Search & Retrieval .. attribute:: search_enabled Enable web search for document discovery .. attribute:: search_depth Search depth for web queries ("basic" or "advanced") .. attribute:: retrieval_strategy Strategy for document retrieval .. attribute:: retrieval_config Configuration for retrieval components .. attribute:: # Query Processing .. attribute:: query_refinement Enable query refinement for better results .. attribute:: multi_query_enabled Enable multiple query variations .. attribute:: query_expansion Enable query expansion techniques .. attribute:: # Document Processing .. attribute:: annotation_enabled Enable document annotation .. attribute:: summarization_enabled Enable document summarization .. attribute:: kg_extraction_enabled Enable knowledge graph extraction .. attribute:: # RAG Configuration .. attribute:: rag_strategy RAG strategy to use .. attribute:: context_window_size Context window size for RAG .. attribute:: chunk_size Chunk size for document splitting .. attribute:: chunk_overlap Overlap between chunks .. attribute:: # Embedding & Vectorization .. attribute:: embedding_model Embedding model to use .. attribute:: vector_store_config Vector store configuration .. attribute:: # Performance .. attribute:: enable_caching Enable document caching .. attribute:: cache_ttl Cache time-to-live in seconds .. attribute:: enable_streaming Enable streaming responses .. attribute:: # Output .. attribute:: structured_output Enable structured output generation .. attribute:: response_format Format for agent responses .. attribute:: include_sources Include source information in responses .. attribute:: include_metadata Include processing metadata Create a new model by parsing and validating input data from keyword arguments. Raises [`ValidationError`][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model. `self` is explicitly positional-only to allow `self` as a field name. .. py:class:: DocumentProcessingResult(/, **data) Bases: :py:obj:`pydantic.BaseModel` Result from document processing operation. .. attribute:: response Main response content .. attribute:: sources List of source documents used .. attribute:: metadata Processing metadata .. attribute:: documents Processed documents .. attribute:: query_info Information about query processing .. attribute:: timing Timing information .. attribute:: statistics Processing statistics Create a new model by parsing and validating input data from keyword arguments. Raises [`ValidationError`][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model. `self` is explicitly positional-only to allow `self` as a field name. .. py:class:: DocumentProcessingState(/, **data) Bases: :py:obj:`haive.core.schema.prebuilt.messages_state.MessagesState` State for document processing operations. Extends MessagesState with document-specific fields for tracking document processing workflows. Create a new model by parsing and validating input data from keyword arguments. Raises [`ValidationError`][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model. `self` is explicitly positional-only to allow `self` as a field name.