haive.core.engine.retriever.providers.ArxivRetrieverConfig¶

Arxiv Retriever implementation for the Haive framework.

from typing import Any This module provides a configuration class for the Arxiv retriever, which retrieves academic papers from the arXiv preprint repository.

The ArxivRetriever works by: 1. Taking a search query for academic papers 2. Searching the arXiv API for matching papers 3. Returning paper abstracts and metadata as documents

This retriever is particularly useful when: - Working with academic or research content - Need access to the latest preprint papers - Building research-focused applications - Combining with other retrievers in academic contexts

The implementation integrates with LangChain’s ArxivRetriever while providing a consistent Haive configuration interface.

Classes¶

ArxivRetrieverConfig

Configuration for Arxiv retriever in the Haive framework.

Module Contents¶

class haive.core.engine.retriever.providers.ArxivRetrieverConfig.ArxivRetrieverConfig[source]¶

Bases: haive.core.engine.retriever.retriever.BaseRetrieverConfig

Configuration for Arxiv retriever in the Haive framework.

This retriever searches the arXiv preprint repository for academic papers matching the query and returns their abstracts and metadata as documents.

retriever_type¶

The type of retriever (always ARXIV).

Type:

RetrieverType

top_k_results¶

Maximum number of papers to retrieve (default: 3).

Type:

int

load_max_docs¶

Maximum number of documents to load (default: 100).

Type:

int

load_all_available_meta¶

Whether to load all available metadata (default: False).

Type:

bool

Examples

>>> from haive.core.engine.retriever import ArxivRetrieverConfig
>>>
>>> # Create the arxiv retriever config
>>> config = ArxivRetrieverConfig(
...     name="arxiv_retriever",
...     top_k_results=5,
...     load_max_docs=50
... )
>>>
>>> # Instantiate and use the retriever
>>> retriever = config.instantiate()
>>> docs = retriever.get_relevant_documents("machine learning transformers")
get_input_fields()[source]¶

Return input field definitions for Arxiv retriever.

Return type:

dict[str, tuple[type, Any]]

get_output_fields()[source]¶

Return output field definitions for Arxiv retriever.

Return type:

dict[str, tuple[type, Any]]

instantiate()[source]¶

Create an Arxiv retriever from this configuration.

Returns:

Instantiated retriever ready for document retrieval.

Return type:

ArxivRetriever

Raises:

ImportError – If required packages are not available.