haive.core.engine.vectorstore.VectorStoreConfig¶

class haive.core.engine.vectorstore.VectorStoreConfig(*, id=<factory>, name=<factory>, engine_type=EngineType.VECTOR_STORE, description=None, input_schema=None, output_schema=None, version='1.0.0', metadata=<factory>, embedding_model=<factory>, vector_store_provider=VectorStoreProvider.FAISS, documents=<factory>, vector_store_path='vector_store', docstore_path='docstore', k=4, score_threshold=None, search_type='similarity', vector_store_kwargs=<factory>)[source]¶

Configuration model for a vector store engine.

VectorStoreConfig provides a consistent interface for creating and using vector stores with embeddings. It encapsulates all the configuration needed to create and interact with various vector store backends, abstracting away provider-specific implementation details.

This class enables: 1. Creating vector stores with various providers (FAISS, Chroma, Pinecone, etc.) 2. Managing documents and embeddings for vector storage 3. Performing similarity searches with configurable parameters 4. Creating retrievers that can be used in retrieval chains

Parameters:

id (str)
name (str)
engine_type (EngineType)
description (str | None)
input_schema (type[BaseModel] | None)
output_schema (type[BaseModel] | None)
version (str)
metadata (dict[str, Any])
embedding_model (BaseEmbeddingConfig)
vector_store_provider (VectorStoreProvider)
documents (list[Document])
vector_store_path (str)
docstore_path (str)
k (int)
score_threshold (float | None)
search_type (str)
vector_store_kwargs (dict[str, Any])

engine_type¶

The type of engine (always VECTOR_STORE).

Type:: EngineType

embedding_model¶

Configuration for the embedding model.

Type:: BaseEmbeddingConfig

vector_store_provider¶

The vector store provider to use.

Type:: VectorStoreProvider

documents¶

Documents to store in the vector store.

Type:: List[Document]

vector_store_path¶

Path for storing vector indices on disk.

Type:: str

docstore_path¶

Path for storing document data.

Type:: str

k¶

Default number of documents to retrieve in searches.

Type:: int

score_threshold¶

Minimum similarity score for results.

Type:: Optional[float]

search_type¶

Search algorithm to use (e.g., “similarity”, “mmr”).

Type:: str

vector_store_kwargs¶

Additional provider-specific parameters.

Type:: Dict[str, Any]

Examples

>>> from haive.core.engine.vectorstore import VectorStoreConfig, VectorStoreProvider
>>> from haive.core.models.embeddings.base import HuggingFaceEmbeddingConfig
>>> from langchain_core.documents import Document
>>>
>>> # Create configuration
>>> config = VectorStoreConfig(
...     name="product_search",
...     documents=[Document(page_content="iPhone 13: The latest smartphone from Apple")],
...     vector_store_provider=VectorStoreProvider.FAISS,
...     embedding_model=HuggingFaceEmbeddingConfig(
...         model="sentence-transformers/all-MiniLM-L6-v2"
...     ),
...     k=5
... )
>>>
>>> # Create vector store
>>> vectorstore = config.create_vectorstore()
>>>
>>> # Perform similarity search
>>> results = config.similarity_search("smartphone", k=3)
>>>
>>> # Create a retriever
>>> retriever = config.create_retriever(search_type="mmr")

classmethod create_vs_config_from_documents(documents, embedding_model=None, **kwargs)[source]¶

Create a VectorStoreConfig from a list of documents.

Parameters:

documents (list[Document]) – List of documents to include
embedding_model (BaseEmbeddingConfig | None) – Optional embedding model configuration
**kwargs – Additional parameters for the config

Returns:

Configured VectorStoreConfig

Return type:

VectorStoreConfig

classmethod create_vs_from_documents(documents, embedding_model=None, **kwargs)[source]¶

Create a VectorStore from a list of documents.

Parameters:

documents (list[Document]) – List of documents to include
embedding_model (BaseEmbeddingConfig | None) – Optional embedding model configuration
**kwargs – Additional parameters for the config

Returns:

Instantiated VectorStore

Return type:

VectorStore

classmethod validate_engine_type(v)[source]¶

Validate Engine Type.

Parameters:: v – [TODO: Add description]
Returns:: Add return description]
Return type:: [TODO

add_document(document)[source]¶

Add a single document to the vector store config.

Parameters:: document (Document) – Document to add
Return type:: None

add_documents(documents)[source]¶

Add multiple documents to the vector store config.

Parameters:: documents (list[Document]) – List of documents to add
Return type:: None

create_retriever(search_type=None, search_kwargs=None, **kwargs)[source]¶

Create a retriever from the vector store.

Parameters:

search_type (str | None) – Search type (similarity, mmr, etc.)
search_kwargs (dict[str, Any] | None) – Search parameters
**kwargs – Additional parameters for the retriever

Returns:

Configured retriever

Return type:

BaseRetriever

create_runnable(runnable_config=None)[source]¶

Create a vector store instance with configuration applied.

Parameters:: runnable_config (RunnableConfig | None) – Optional runtime configuration
Returns:: Instantiated vector store
Return type:: VectorStore

create_vectorstore(async_mode=False)[source]¶

Create a vector store instance from this configuration.

Instantiates a vector store of the configured provider type, using the documents and embedding model specified in the configuration. This method handles the details of creating the appropriate vector store class, initializing it with the correct parameters, and populating it with documents.

The method supports both synchronous and asynchronous initialization paths, and includes special handling for empty document collections.

Parameters:

async_mode (bool) – Whether to use async methods for vector store creation. Default is False. If True, the method will use asynchronous variants of the vector store creation methods if available.

Returns:

An instantiated vector store of the configured provider type,: populated with the configured documents and using the specified embedding model.

Return type:

VectorStore

Raises:

ValueError – If an empty vector store cannot be created with the specified provider.

Examples

>>> config = VectorStoreConfig(
...     name="product_catalog",
...     vector_store_provider=VectorStoreProvider.FAISS,
...     documents=[Document(page_content="Product description...")]
... )
>>> vectorstore = config.create_vectorstore()
>>>
>>> # With async mode
>>> async def create_async():
...     return await config.create_vectorstore(async_mode=True)

extract_params()[source]¶

Extract parameters from this engine for serialization.

Returns:: Dictionary of engine parameters
Return type:: dict[str, Any]

get_input_fields()[source]¶

Return input field definitions as field_name -> (type, default) pairs.

Returns:: Dictionary mapping field names to (type, default) tuples
Return type:: dict[str, tuple[type, Any]]

get_output_fields()[source]¶

Return output field definitions as field_name -> (type, default) pairs.

Returns:: Dictionary mapping field names to (type, default) tuples
Return type:: dict[str, tuple[type, Any]]

get_vectorstore(embedding=None, async_mode=False)[source]¶

Get the vector store with optional embedding override.

Parameters:

embedding – Optional embedding model override
async_mode (bool) – Whether to use async methods

Returns:

Instantiated vector store

Return type:

VectorStore

invoke(input_data, runnable_config=None)[source]¶

Invoke the vector store with input data.

Parameters:

input_data (str | dict[str, Any]) – Query string or dictionary with search parameters
runnable_config (RunnableConfig | None) – Optional runtime configuration

Returns:

List of retrieved documents

Return type:

list[Document]

similarity_search(query, k=None, score_threshold=None, filter=None, search_type=None, runnable_config=None)[source]¶

Perform similarity search with configurable parameters.

Parameters:

query (str) – Query string
k (int | None) – Number of documents to retrieve (overrides default)
score_threshold (float | None) – Score threshold for filtering results
filter (dict[str, Any] | None) – Optional filter for the search
search_type (str | None) – Search type (similarity, mmr, etc.)
runnable_config (RunnableConfig | None) – Optional runtime configuration

Returns:

List of retrieved documents

Return type:

list[Document]

docstore_path: str¶

documents: list[Document]¶

embedding_model: BaseEmbeddingConfig¶

engine_type: EngineType¶

k: int¶

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

score_threshold: float | None¶

search_type: str¶

vector_store_kwargs: dict[str, Any]¶

vector_store_path: str¶

vector_store_provider: VectorStoreProvider¶