haive.core.engine.retriever.providers.WebResearchRetrieverConfig¶

from typing import Any. Web Research Retriever implementation for the Haive framework.

This module provides a configuration class for the Web Research retriever, which performs advanced web research by combining web search with document processing and retrieval. It searches the web, retrieves content from URLs, processes the content, and provides comprehensive research results.

The WebResearchRetriever works by: 1. Using a web search API to find relevant URLs 2. Retrieving and processing content from those URLs 3. Chunking and embedding the retrieved content 4. Providing retrieval over the processed web content 5. Combining search results with retrieved document chunks

This retriever is particularly useful when: - Need up-to-date information from the web - Building research applications that require current data - Combining web search with document retrieval - Creating systems that need comprehensive web coverage - Building fact-checking or research assistant applications

The implementation integrates with LangChain’s WebResearchRetriever while providing a consistent Haive configuration interface with secure API key management.

Classes¶

WebResearchRetrieverConfig

Configuration for Web Research retriever in the Haive framework.

Module Contents¶

class haive.core.engine.retriever.providers.WebResearchRetrieverConfig.WebResearchRetrieverConfig[source]¶

Bases: haive.core.common.mixins.secure_config.SecureConfigMixin, haive.core.engine.retriever.retriever.BaseRetrieverConfig

Configuration for Web Research retriever in the Haive framework.

This retriever performs comprehensive web research by searching the web, retrieving content, and providing retrieval capabilities over the collected data.

retriever_type¶

The type of retriever (always WEB_RESEARCH).

Type:: RetrieverType

vectorstore_config¶

Vector store for indexing web content.

Type:: VectorStoreConfig

llm_config¶

LLM for processing and summarization.

Type:: AugLLMConfig

api_key¶

API key for web search (auto-resolved).

Type:: Optional[SecretStr]

num_search_results¶

Number of web search results to process.

Type:: int

num_web_pages¶

Number of web pages to retrieve content from.

Type:: int

chunk_size¶

Size of text chunks for processing.

Type:: int

chunk_overlap¶

Overlap between text chunks.

Type:: int

Examples

>>> from haive.core.engine.retriever import WebResearchRetrieverConfig
>>> from haive.core.engine.aug_llm import AugLLMConfig
>>> from haive.core.engine.vectorstore.providers.ChromaVectorStoreConfig import ChromaVectorStoreConfig
>>>
>>> # Configure components
>>> llm_config = AugLLMConfig(model_name="gpt-4", provider="openai")
>>> vectorstore_config = ChromaVectorStoreConfig(
...     name="web_research_store",
...     collection_name="web_content"
... )
>>>
>>> # Create the web research retriever config
>>> config = WebResearchRetrieverConfig(
...     name="web_research_retriever",
...     vectorstore_config=vectorstore_config,
...     llm_config=llm_config,
...     num_search_results=10,
...     num_web_pages=5
... )
>>>
>>> # Instantiate and use the retriever
>>> retriever = config.instantiate()
>>> docs = retriever.get_relevant_documents("latest AI research developments 2024")

get_input_fields()[source]¶

Return input field definitions for Web Research retriever.

Return type:: dict[str, tuple[type, Any]]

get_output_fields()[source]¶

Return output field definitions for Web Research retriever.

Return type:: dict[str, tuple[type, Any]]

instantiate()[source]¶

Create a Web Research retriever from this configuration.

Returns:

Instantiated retriever ready for web research.

Return type:

WebResearchRetriever

Raises:

ImportError – If required packages are not available.
ValueError – If API key or configuration is invalid.