haive.agents.document_processing.agent
======================================

.. py:module:: haive.agents.document_processing.agent

.. autoapi-nested-parse::

   Comprehensive Document Processing Agent.

   This agent provides end-to-end document processing capabilities including:
   - Document fetching with ReactAgent + search tools
   - Auto-loading with bulk processing
   - Transform/split/annotate/embed pipeline
   - Advanced RAG features (refined queries, self-query, etc.)
   - State management and persistence

   The agent integrates all existing Haive document processing components into
   a unified, powerful system for document-based AI workflows.

   .. rubric:: Examples

   Basic document processing::

       agent = DocumentProcessingAgent()
       result = agent.process_query("Load and analyze reports from https://company.com/reports")

   Advanced RAG with custom retrieval::

       config = DocumentProcessingConfig(
           retrieval_strategy="self_query",
           query_refinement=True,
           annotation_enabled=True,
           embedding_model="text-embedding-3-large"
       )
       agent = DocumentProcessingAgent(config=config)
       result = agent.process_query("Find all financial projections from Q4 2024")

   Multi-source document processing::

       sources = [
           "/path/to/local/docs/",
           "https://wiki.company.com/procedures",
           "s3://bucket/documents/",
           {"url": "https://api.service.com/docs", "headers": {"Authorization": "Bearer token"}}
       ]
       agent = DocumentProcessingAgent()
       result = agent.process_sources(sources, query="Extract key insights")

   Author: Claude (Haive AI Agent Framework)
   Version: 1.0.0


Classes
-------

.. autoapisummary::

   haive.agents.document_processing.agent.DocumentProcessingAgent
   haive.agents.document_processing.agent.DocumentProcessingConfig
   haive.agents.document_processing.agent.DocumentProcessingResult
   haive.agents.document_processing.agent.DocumentProcessingState


Module Contents
---------------

.. py:class:: DocumentProcessingAgent(config = None, engine = None, name = 'document_processor')

   Comprehensive document processing agent with full pipeline capabilities.

   This agent provides a complete document processing pipeline including:
   1. Document Discovery & Fetching (ReactAgent + search tools)
   2. Auto-loading with bulk processing
   3. Transform/split/annotate/embed pipeline
   4. Advanced RAG features
   5. State management and persistence

   The agent integrates all existing Haive document processing components
   into a unified, powerful system for document-based AI workflows.

   Initialize the document processing agent.

   :param config: Configuration for document processing
   :param engine: LLM engine configuration
   :param name: Agent name for identification


   .. py:method:: get_capabilities()

      Get agent capabilities and configuration.


   .. py:method:: process_query(query, sources = None)
      :async:


      Process a query with comprehensive document processing pipeline.

      :param query: The user query to process
      :param sources: Optional list of specific sources to use

      :returns: DocumentProcessingResult with comprehensive results


   .. py:method:: process_sources(sources, query)
      :async:


      Process specific sources with a query.

      :param sources: List of sources to process
      :param query: Query to process against the sources

      :returns: DocumentProcessingResult with results


.. py:class:: DocumentProcessingConfig(/, **data)

   Bases: :py:obj:`pydantic.BaseModel`


   Configuration for comprehensive document processing.

   .. attribute:: # Core Processing

      
   .. attribute:: auto_loader_config

      Configuration for document auto-loading

   .. attribute:: enable_bulk_processing

      Enable concurrent bulk document processing

   .. attribute:: max_concurrent_loads

      Maximum concurrent document loads

   .. attribute:: # Search & Retrieval

      
   .. attribute:: search_enabled

      Enable web search for document discovery

   .. attribute:: search_depth

      Search depth for web queries ("basic" or "advanced")

   .. attribute:: retrieval_strategy

      Strategy for document retrieval

   .. attribute:: retrieval_config

      Configuration for retrieval components

   .. attribute:: # Query Processing

      
   .. attribute:: query_refinement

      Enable query refinement for better results

   .. attribute:: multi_query_enabled

      Enable multiple query variations

   .. attribute:: query_expansion

      Enable query expansion techniques

   .. attribute:: # Document Processing

      
   .. attribute:: annotation_enabled

      Enable document annotation

   .. attribute:: summarization_enabled

      Enable document summarization

   .. attribute:: kg_extraction_enabled

      Enable knowledge graph extraction

   .. attribute:: # RAG Configuration

      
   .. attribute:: rag_strategy

      RAG strategy to use

   .. attribute:: context_window_size

      Context window size for RAG

   .. attribute:: chunk_size

      Chunk size for document splitting

   .. attribute:: chunk_overlap

      Overlap between chunks

   .. attribute:: # Embedding & Vectorization

      
   .. attribute:: embedding_model

      Embedding model to use

   .. attribute:: vector_store_config

      Vector store configuration

   .. attribute:: # Performance

      
   .. attribute:: enable_caching

      Enable document caching

   .. attribute:: cache_ttl

      Cache time-to-live in seconds

   .. attribute:: enable_streaming

      Enable streaming responses

   .. attribute:: # Output

      
   .. attribute:: structured_output

      Enable structured output generation

   .. attribute:: response_format

      Format for agent responses

   .. attribute:: include_sources

      Include source information in responses

   .. attribute:: include_metadata

      Include processing metadata

   Create a new model by parsing and validating input data from keyword arguments.

   Raises [`ValidationError`][pydantic_core.ValidationError] if the input data cannot be
   validated to form a valid model.

   `self` is explicitly positional-only to allow `self` as a field name.


.. py:class:: DocumentProcessingResult(/, **data)

   Bases: :py:obj:`pydantic.BaseModel`


   Result from document processing operation.

   .. attribute:: response

      Main response content

   .. attribute:: sources

      List of source documents used

   .. attribute:: metadata

      Processing metadata

   .. attribute:: documents

      Processed documents

   .. attribute:: query_info

      Information about query processing

   .. attribute:: timing

      Timing information

   .. attribute:: statistics

      Processing statistics

   Create a new model by parsing and validating input data from keyword arguments.

   Raises [`ValidationError`][pydantic_core.ValidationError] if the input data cannot be
   validated to form a valid model.

   `self` is explicitly positional-only to allow `self` as a field name.


.. py:class:: DocumentProcessingState(/, **data)

   Bases: :py:obj:`haive.core.schema.prebuilt.messages_state.MessagesState`


   State for document processing operations.

   Extends MessagesState with document-specific fields for tracking
   document processing workflows.

   Create a new model by parsing and validating input data from keyword arguments.

   Raises [`ValidationError`][pydantic_core.ValidationError] if the input data cannot be
   validated to form a valid model.

   `self` is explicitly positional-only to allow `self` as a field name.