haive.core.engine.vectorstore.VectorStoreConfig

class haive.core.engine.vectorstore.VectorStoreConfig(*, id=<factory>, name=<factory>, engine_type=EngineType.VECTOR_STORE, description=None, input_schema=None, output_schema=None, version='1.0.0', metadata=<factory>, embedding_model=<factory>, vector_store_provider=VectorStoreProvider.FAISS, documents=<factory>, vector_store_path='vector_store', docstore_path='docstore', k=4, score_threshold=None, search_type='similarity', vector_store_kwargs=<factory>)[source]

Configuration model for a vector store engine.

VectorStoreConfig provides a consistent interface for creating and using vector stores with embeddings. It encapsulates all the configuration needed to create and interact with various vector store backends, abstracting away provider-specific implementation details.

This class enables: 1. Creating vector stores with various providers (FAISS, Chroma, Pinecone, etc.) 2. Managing documents and embeddings for vector storage 3. Performing similarity searches with configurable parameters 4. Creating retrievers that can be used in retrieval chains

Parameters:
engine_type

The type of engine (always VECTOR_STORE).

Type:

EngineType

embedding_model

Configuration for the embedding model.

Type:

BaseEmbeddingConfig

vector_store_provider

The vector store provider to use.

Type:

VectorStoreProvider

documents

Documents to store in the vector store.

Type:

List[Document]

vector_store_path

Path for storing vector indices on disk.

Type:

str

docstore_path

Path for storing document data.

Type:

str

k

Default number of documents to retrieve in searches.

Type:

int

score_threshold

Minimum similarity score for results.

Type:

Optional[float]

search_type

Search algorithm to use (e.g., “similarity”, “mmr”).

Type:

str

vector_store_kwargs

Additional provider-specific parameters.

Type:

Dict[str, Any]

Examples

>>> from haive.core.engine.vectorstore import VectorStoreConfig, VectorStoreProvider
>>> from haive.core.models.embeddings.base import HuggingFaceEmbeddingConfig
>>> from langchain_core.documents import Document
>>>
>>> # Create configuration
>>> config = VectorStoreConfig(
...     name="product_search",
...     documents=[Document(page_content="iPhone 13: The latest smartphone from Apple")],
...     vector_store_provider=VectorStoreProvider.FAISS,
...     embedding_model=HuggingFaceEmbeddingConfig(
...         model="sentence-transformers/all-MiniLM-L6-v2"
...     ),
...     k=5
... )
>>>
>>> # Create vector store
>>> vectorstore = config.create_vectorstore()
>>>
>>> # Perform similarity search
>>> results = config.similarity_search("smartphone", k=3)
>>>
>>> # Create a retriever
>>> retriever = config.create_retriever(search_type="mmr")
classmethod create_vs_config_from_documents(documents, embedding_model=None, **kwargs)[source]

Create a VectorStoreConfig from a list of documents.

Parameters:
  • documents (list[Document]) – List of documents to include

  • embedding_model (BaseEmbeddingConfig | None) – Optional embedding model configuration

  • **kwargs – Additional parameters for the config

Returns:

Configured VectorStoreConfig

Return type:

VectorStoreConfig

classmethod create_vs_from_documents(documents, embedding_model=None, **kwargs)[source]

Create a VectorStore from a list of documents.

Parameters:
  • documents (list[Document]) – List of documents to include

  • embedding_model (BaseEmbeddingConfig | None) – Optional embedding model configuration

  • **kwargs – Additional parameters for the config

Returns:

Instantiated VectorStore

Return type:

VectorStore

classmethod validate_engine_type(v)[source]

Validate Engine Type.

Parameters:

v – [TODO: Add description]

Returns:

Add return description]

Return type:

[TODO

add_document(document)[source]

Add a single document to the vector store config.

Parameters:

document (Document) – Document to add

Return type:

None

add_documents(documents)[source]

Add multiple documents to the vector store config.

Parameters:

documents (list[Document]) – List of documents to add

Return type:

None

create_retriever(search_type=None, search_kwargs=None, **kwargs)[source]

Create a retriever from the vector store.

Parameters:
  • search_type (str | None) – Search type (similarity, mmr, etc.)

  • search_kwargs (dict[str, Any] | None) – Search parameters

  • **kwargs – Additional parameters for the retriever

Returns:

Configured retriever

Return type:

BaseRetriever

create_runnable(runnable_config=None)[source]

Create a vector store instance with configuration applied.

Parameters:

runnable_config (RunnableConfig | None) – Optional runtime configuration

Returns:

Instantiated vector store

Return type:

VectorStore

create_vectorstore(async_mode=False)[source]

Create a vector store instance from this configuration.

Instantiates a vector store of the configured provider type, using the documents and embedding model specified in the configuration. This method handles the details of creating the appropriate vector store class, initializing it with the correct parameters, and populating it with documents.

The method supports both synchronous and asynchronous initialization paths, and includes special handling for empty document collections.

Parameters:

async_mode (bool) – Whether to use async methods for vector store creation. Default is False. If True, the method will use asynchronous variants of the vector store creation methods if available.

Returns:

An instantiated vector store of the configured provider type,

populated with the configured documents and using the specified embedding model.

Return type:

VectorStore

Raises:

ValueError – If an empty vector store cannot be created with the specified provider.

Examples

>>> config = VectorStoreConfig(
...     name="product_catalog",
...     vector_store_provider=VectorStoreProvider.FAISS,
...     documents=[Document(page_content="Product description...")]
... )
>>> vectorstore = config.create_vectorstore()
>>>
>>> # With async mode
>>> async def create_async():
...     return await config.create_vectorstore(async_mode=True)
extract_params()[source]

Extract parameters from this engine for serialization.

Returns:

Dictionary of engine parameters

Return type:

dict[str, Any]

get_input_fields()[source]

Return input field definitions as field_name -> (type, default) pairs.

Returns:

Dictionary mapping field names to (type, default) tuples

Return type:

dict[str, tuple[type, Any]]

get_output_fields()[source]

Return output field definitions as field_name -> (type, default) pairs.

Returns:

Dictionary mapping field names to (type, default) tuples

Return type:

dict[str, tuple[type, Any]]

get_vectorstore(embedding=None, async_mode=False)[source]

Get the vector store with optional embedding override.

Parameters:
  • embedding – Optional embedding model override

  • async_mode (bool) – Whether to use async methods

Returns:

Instantiated vector store

Return type:

VectorStore

invoke(input_data, runnable_config=None)[source]

Invoke the vector store with input data.

Parameters:
  • input_data (str | dict[str, Any]) – Query string or dictionary with search parameters

  • runnable_config (RunnableConfig | None) – Optional runtime configuration

Returns:

List of retrieved documents

Return type:

list[Document]

Perform similarity search with configurable parameters.

Parameters:
  • query (str) – Query string

  • k (int | None) – Number of documents to retrieve (overrides default)

  • score_threshold (float | None) – Score threshold for filtering results

  • filter (dict[str, Any] | None) – Optional filter for the search

  • search_type (str | None) – Search type (similarity, mmr, etc.)

  • runnable_config (RunnableConfig | None) – Optional runtime configuration

Returns:

List of retrieved documents

Return type:

list[Document]

docstore_path: str
documents: list[Document]
embedding_model: BaseEmbeddingConfig
engine_type: EngineType
k: int
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

score_threshold: float | None
search_type: str
vector_store_kwargs: dict[str, Any]
vector_store_path: str
vector_store_provider: VectorStoreProvider