haive.agents.document_modifiers.complex_extraction.agent
========================================================

.. py:module:: haive.agents.document_modifiers.complex_extraction.agent

.. autoapi-nested-parse::

   Complex Extraction Agent for structured data extraction from text.

   This module provides the ComplexExtractionAgent class which implements sophisticated
   structured data extraction using validation with retries and optional JSONPatch-based
   error correction to reliably extract data according to specified schemas.

   The agent supports multiple retry strategies and can handle complex validation
   scenarios where initial extraction attempts may fail.

   Classes:
       ComplexExtractionAgent: Main agent for complex structured data extraction

   .. rubric:: Examples

   Basic usage::

       from haive.agents.document_modifiers.complex_extraction import ComplexExtractionAgent
       from haive.agents.document_modifiers.complex_extraction.config import ComplexExtractionAgentConfig
       from pydantic import BaseModel

       class PersonInfo(BaseModel):
           name: str
           age: int
           occupation: str

       config = ComplexExtractionAgentConfig(
           extraction_model=PersonInfo,
           max_retries=3
       )
       agent = ComplexExtractionAgent(config)

       text = "John Smith is a 35-year-old software engineer."
       result = agent.run(text)
       person_data = result["extracted_data"]

   With JSONPatch error correction::

       config = ComplexExtractionAgentConfig(
           extraction_model=PersonInfo,
           use_jsonpatch=True,
           max_retries=5
       )
       agent = ComplexExtractionAgent(config)
       result = agent.run(complex_text)

   .. seealso::

      - :class:`~haive.agents.document_modifiers.complex_extraction.config.ComplexExtractionAgentConfig`: Configuration class
      - :class:`~haive.agents.document_modifiers.complex_extraction.models.RetryStrategy`: Retry strategy configuration


Classes
-------

.. autoapisummary::

   haive.agents.document_modifiers.complex_extraction.agent.ComplexExtractionAgent


Module Contents
---------------

.. py:class:: ComplexExtractionAgent(config = ComplexExtractionAgentConfig())

   Bases: :py:obj:`haive.core.engine.agent.agent.Agent`\ [\ :py:obj:`haive.agents.document_modifiers.complex_extraction.config.ComplexExtractionAgentConfig`\ ]


   Agent that extracts complex structured information from text.

   This agent implements sophisticated structured data extraction using validation
   with retries and optional JSONPatch-based error correction to reliably extract
   data according to specified Pydantic schemas.

   The agent creates a validation workflow that can handle complex extraction
   scenarios where initial attempts may fail due to parsing errors, validation
   issues, or incomplete data. It supports multiple retry strategies and can
   automatically correct errors using JSONPatch operations.

   :param config: Configuration object containing extraction settings, model schema,
                  retry parameters, and LLM configuration.

   .. attribute:: extraction_model

      Pydantic model class defining the extraction schema

   .. attribute:: max_retries

      Maximum number of retry attempts for failed extractions

   .. attribute:: force_tool_choice

      Whether to force the LLM to use the extraction tool

   .. attribute:: use_jsonpatch

      Whether to enable JSONPatch-based error correction

   .. attribute:: extraction_tool

      Tool instance created from the extraction model

   .. attribute:: llm

      Language model instance for performing extractions

   .. rubric:: Examples

   Basic structured extraction::

       from pydantic import BaseModel

       class ProductInfo(BaseModel):
           name: str
           price: float
           category: str

       config = ComplexExtractionAgentConfig(
           extraction_model=ProductInfo,
           max_retries=3
       )
       agent = ComplexExtractionAgent(config)

       text = "The MacBook Pro costs $2499 and is a laptop computer."
       result = agent.run(text)
       product = result["extracted_data"]
       # product = {"name": "MacBook Pro", "price": 2499.0, "category": "laptop"}

   With advanced error correction::

       config = ComplexExtractionAgentConfig(
           extraction_model=ProductInfo,
           use_jsonpatch=True,
           max_retries=5,
           force_tool_choice=True
       )
       agent = ComplexExtractionAgent(config)

   Processing multiple documents::

       documents = ["Product A costs $100", "Product B is $200 software"]
       results = [agent.run(doc) for doc in documents]

   .. note::

      The agent requires a Pydantic model class to define the extraction schema.
      JSONPatch functionality requires the 'jsonpatch' library to be installed.

   :raises ImportError: If JSONPatch is enabled but the jsonpatch library is not installed
   :raises ValueError: If extraction fails after maximum retry attempts

   .. seealso::

      - :class:`ComplexExtractionAgentConfig`: Configuration options
      - :class:`RetryStrategy`: Retry strategy configuration
      - :class:`PatchFunctionParameters`: JSONPatch parameter schema

   Initialize the complex extraction agent.

   Sets up the extraction model, validation tools, and retry mechanisms
   based on the provided configuration.

   :param config: Configuration object containing extraction model, retry settings,
                  and LLM configuration. Defaults to a new instance with default values.

   :raises ImportError: If JSONPatch is enabled in config but jsonpatch library
       is not installed.


   .. py:method:: bind_validator_with_jsonpatch_retries(llm, *, tools, tool_choice = None, max_attempts = 3)

      Bind a validator with JSONPatch-based retries.

      Creates an advanced validation workflow that uses JSONPatch operations
      to automatically correct validation errors. When a tool call fails
      validation, the system generates patch instructions to fix the errors.

      :param llm: The base language model to use for extraction and error correction.
      :param tools: List of tools available for extraction. The validation will
                    ensure tool calls conform to these tool schemas.
      :param tool_choice: Optional specific tool name to force the LLM to use.
                          If specified, the LLM must use this tool.
      :param max_attempts: Maximum number of retry attempts before giving up.
                           Defaults to 3.

      :returns: StateGraph builder instance (not compiled). Must be compiled before use.

      :raises ImportError: If the jsonpatch library is not installed but JSONPatch
          functionality is requested.

      .. note::

         This method creates a sophisticated retry mechanism where:
         1. Initial extraction attempts use the primary LLM
         2. Validation errors trigger JSONPatch correction attempts
         3. Patch operations are applied to fix specific validation issues
         4. Multiple correction iterations are supported up to max_attempts


   .. py:method:: bind_validator_with_retries(llm, *, tools, tool_choice = None, max_attempts = 3)

      Bind a validator with standard retries (no JSONPatch).

      Creates a basic validation workflow with simple retry logic. When
      validation fails, the system will retry the extraction up to the
      maximum number of attempts without advanced error correction.

      :param llm: The base language model to use for extraction attempts.
      :param tools: List of tools available for extraction. Tool calls will be
                    validated against these tool schemas.
      :param tool_choice: Optional specific tool name to force the LLM to use.
                          If specified, the LLM must call this tool.
      :param max_attempts: Maximum number of retry attempts before failing.
                           Defaults to 3.

      :returns: StateGraph builder instance (not compiled). Must be compiled before use.

      .. note::

         This is the simpler alternative to JSONPatch-based retries. It will
         simply retry failed extractions without attempting to automatically
         correct validation errors.


   .. py:method:: extract_node(state)

      Main extraction node function.

      Processes the current state through the extraction pipeline, invoking
      the configured extraction tool and handling the results.

      :param state: Current workflow state containing messages and other context.
                    Can be either a dictionary with 'messages' key or an object
                    with messages attribute.

      :returns:

                - extracted_data: The structured data extracted by the tool
                - messages: Updated message list including extraction results
                - error: Error message if extraction failed
      :rtype: Updated state dictionary containing

      .. note::

         This method handles various state formats and gracefully manages
         errors during extraction. If no extraction runnable is available,
         the state is passed through unchanged.


   .. py:method:: run(input_data, **kwargs)

      Run the extraction agent on input data.

      Processes the input through the extraction pipeline, handling various
      input formats and returning structured extraction results.

      :param input_data: Input text or data to extract information from. Supports:
                         - str: Single text document
                         - List[str]: Multiple text documents to process together
                         - Dict[str, Any]: Dictionary with 'text', 'content', or 'messages' keys
                         - BaseModel: Pydantic model with text content
      :param \*\*kwargs: Additional runtime configuration options passed to the
                         underlying workflow execution.

      :returns:

                - extracted_data: Structured data conforming to the extraction model
                - messages: Full conversation history during extraction
                - Additional metadata from the extraction process
      :rtype: Dictionary containing extraction results

      .. rubric:: Examples

      Basic text extraction::

          agent = ComplexExtractionAgent(config)
          result = agent.run("John Smith is 30 years old.")
          person_data = result["extracted_data"]

      Multiple documents::

          docs = ["Person A info", "Person B info"]
          result = agent.run(docs)

      .. note::

         If no extraction workflow has been set up, this method will
         automatically call setup_workflow() before processing.


   .. py:method:: setup_workflow()

      Set up the agent workflow.

      Initializes the extraction workflow graph based on the agent configuration.
      This method creates the appropriate validation and retry mechanism (either
      JSONPatch-based or standard retries) and configures the processing pipeline.

      The workflow includes encoding/decoding steps, validation nodes, and
      state management for tracking extraction progress.

      .. note::

         This method is called automatically when needed and does not need
         to be invoked manually. The workflow graph is not compiled here -
         compilation happens in the parent class.